SlideShare a Scribd company logo
1 of 33
AWS Summit
Knowledge Share
- ASHISH MRIG
HTTPS://WWW.LINKEDIN.COM/IN/ASHISHMRIG/
AWS Athena: New Features
New JDBC/ODBC drivers released which are
2-5x faster
Supports CTAS
The drivers integrate with MS Active Directory for
access control (in lieu of access keys)
Supports Views
Introduced Athena Work Groups (in beta)
Athena Work Groups
Can be defined to identify different types of
workload or teams
Integrated with cloud watch and allows collection
of metrics at work groups level
Cost control: Can set query threshold limit for each
work group on usage (GB) or time
Can create and send trigger alerts at the breach of
threshold; option to fail the query as well
Option to disable workgroups
New Features in AWS
Features Description
EMR Notebooks EMR notebook is a "serverless" Jupyter notebook
executed using EMR cluster and managed via EMR
console. Ability to attach notebook to any EMR cluster,
notebook stored in S3
AWS Glue Support for Hive, Spark & Presto
S3 Select Available in Spark, Java, Python. Objects must be in
CSV, JSON, or Parquet with UTF-8 encoding
AWS Textract
(in beta)
Automatically extracts text and data from scanned
documents including data stored in tables & forms
Predictive Scaling ML based feature which will try to predict the workload
and scale accordingly
Serverless Aurora
(in beta)
Serverless offering of Aurora similar to Athena
AWS Glue
Crawlers for automatic data discovery. Auto generate
schema & partitions
Generates Python/Scala code which can be customized
Job bookmarks – keep track of data that is already
processed
Has built-in in Scheduler and integrated with AWS
CloudWatch for notification
Catalog can be shared Athena & Redshift Spectrum
Fine grained catalog permission at Table or Connection
AWS Auto Scaling
Free service to scale EC2/ECS/Aurora/Dynamo
Scaling can be defined on any CloudWatch metric including
custom metric
Typical metrics used: CPU Utilization, memory, incoming
traffic
Scaling Options: Manual/Scheduled/Dynamic
New option added: Predictive Scaling
Uses ML and based on historical patterns of resource
utilization
Daily forecasting at 60 minute grain
Data Lake Architecture:
Best Practices
Build decouple systems: future proof
Design for ability to scale indefinitely with new
business
Focus on core competencies, reduce dependencies on
managing or building infrastructure.
Be very cost conscious, build ‘pay-for-what-you-use’
architecture
Enable your application to leverage ML
Data Lake Architecture:
Best Practices
Design and build for multi-tenancy
Consolidate small files before loading into S3
For full data scan use AVRO file type
Always preserve the raw data in IA or Glacier
Use automated test suites on every
release/commit
Best Practices for
Data Lake Security
Encrypt data at rest (KMS) and in transit (SSL)
Set ownership of S3 buckets at user/team level. It reduces
surface area of attack
Disable S3 delete using IAM roles
Buckets should always be created on business domain and
should have security policies baked in
Backup data across regions
Allow S3 access based on tags (eg – redshift, HIPPA query)
or IAM roles (dev/DS)
Best Practices for
Data Lake Security
Use AWS Config to detect & notify on any S3 policy
changes
Use AWS Macie to detect & classify PII and sensitive
data
Control data access through views, don’t expose the
core tables directly
Use & leverage centralized data catalogs
EMR Best Practices
Run Stateless: the Meta store should be
remote (MySQl or Glue)
Use combination of spot instances to reduce cost
(design for re-runnability).
Don’t specify the Availability Zone to get cheapest
instances
Single spot node termination will not interrupt the
cluster (new feature: graceful decommissioning )
Build Instance fleets with mix of different instance
types (c5/r5..) and different markets (spot/on-demand)
Choosing the Right DB
Days of one-size-fit-all DBs are over
DBs have become specialized based on their use
case, different types:
Eg - Better (faster & cheaper) to use time-series
for storing & plotting time base data compared to
RDBMS.
Relational Key-Value MPP In-Memory
Document Graph Time-Series Ledger
Columnar Distributed Object
AWS Quantum Ledger DB
(QLDB)
Keeps track of transparent, immutable, and
cryptographically verifiable transaction log data over
distributed ledgers
Every entry is written into a journal and cannot be
changed. Journal is append only and maintains two
states: Current & History
Each txn generates a digest using a cryptographic hash
function (SHA-256) which guarantees the integrity
Serverless , SQL support & ACID compliant
Machine Learning
Services
1. AWS SageMaker
Full managed service to build/train/deploy the
machine learning models.
Out of the box optimization for following ML
packages:
Supports: Supervised, Unsupervised & Reinforced
learning
 Supported by EC P3dn.24xlarge (8 Tesla V100 GPUs)
TensorFlow, Apache MXNet, PyTorch, Chainer, Scikit-learn, SparkML,
Horovod, Keras, and Gluon
Machine Learning
Services (contd..)
Framework & model agnostic (use from pre-
trained model library or bring your own)
Integrated with Jupyter notebook
On demand and scalable training clusters
Integrated with different AWS services like
Lambda, API Gateway, Cloudwatch etc
Machine Learning
Services (contd..)
2. AWS SageMaker Ground Truth
Most of ML work is spent in labeling the training
data
This package can help significantly reduce the
time and effort required to create datasets for
training & reduce costs
3. AWS SageMaker Neo
Container to deploy the ML on any hardware or
application
AWS Data Services
Service Description
Data Migration Service Transfers data to AWS cloud; supports homogenous
migrations such as Oracle to Oracle, and
heterogeneous migrations such as Oracle to Aurora
AWS Macie Amazon Macie is a security service that uses
machine learning to automatically discover, classify,
and protect sensitive data in AWS such as PII, IP. It
provides you with dashboards and alerts
AWS Direct Connect It lets you establish a dedicated network
connection between your network and one of the
AWS facility. It saves on bandwidth cost and
provides consistent network performance
AWS Snowball An 80 TB physical device shipped to client location,
where the data is copied and is shipped back to
AWS facility for data copy into AWS. Good for
petabyte scale data copy to save money on transfer
cost
Netflix Push Messaging
Case Study
Netflix had Polling infrastructure to poll all it’s
client interfaces
Polling is inherently inefficient, they were able to
reduce the web traffic by 12% by switching to Push
They have Open Sourced the complete Push
Messaging Framework: Zuul
Available on GitHub:
https://github.com/Netflix/zuul/wiki/Push-
Messaging
Netflix Case Study
Interesting Challenges
Zuul uses persistent connection to make stateful, this
makes deployments difficult
Solved by using Cluster swap, however this created
problem of ‘Thundering herd’ (everyone trying to connect at
same time)
Resolved by introducing connection lifetime (~30 min) and
randomizing connection lifetime
Zuul push cluster can auto-scale based on number of open
connection
AWS allows auto-scaling based on any metric defined in
Cloud Watch
AWS Forecast
(in beta)
Predicts future points in a time series given historical data
Uses deep learning models developed by Amazon
Accuracy is cornerstone of any forecasting; this service is
50% more accurate than traditional methods
It comes with 8 pre-packaged models: 5 custom built
algorithm and 3 traditional for benchmarking
Inputs: Historical Time-Series (eg- electricity consumption
per year), Any Related data (weather), Metadata (location)
Output: Ability to visualize the forecast and export via an
API
All forecasting is probabilistic for a specific prediction
interval including margin or error
AWS Forecast
Models
Traditional Amazon
Pricing
Exponential Smooting
ARIMA
Prophet
Auto Regressive LSTM
Spline Quantile Forecaster
Multi Horizon Quantile (MQ-
RNN)
Cost Type Pricing
Generated forecasts $0.60 per 1,000 forecasts
Data storage $0.088 per GB
Training hours $0.238 per hour
New Terms Learnt
Term Meaning
Dark Data Data that is hidden in files or otherwise not accessible
to the enterprise
Data Ponds Data that live in Silos across the Enterprise
Blast Radius Impact of a deployment, eg – microservice deployment
will have a smaller blast radius compared to a
monolithic API
Thundering Herd When everyone tries to connect at same time, eg – if
your service goes down and after it is restored all users
try to connect simultaneously overwhelming the system
Data Decay Value of data decreases over time, data is most valuable
near its creation
Accelerate Innovation & Maximize Business Value w/
Serverless Apps
More Info
AWS Slide Deck
AWS Videos
AWS re:Invent Recap
In 2019: Dec 2 – Dec 6 @ Las Vegas

More Related Content

What's hot

Scaling web application in the Cloud
Scaling web application in the CloudScaling web application in the Cloud
Scaling web application in the CloudFederico Feroldi
 
AWS-Enabled Disaster Recovery and Business Continuity for SIFIs
AWS-Enabled Disaster Recovery and Business Continuity for SIFIsAWS-Enabled Disaster Recovery and Business Continuity for SIFIs
AWS-Enabled Disaster Recovery and Business Continuity for SIFIsAmazon Web Services
 
AWS Summit Berlin 2013 - Euroforum - Moving an Entire Physical Data Center in...
AWS Summit Berlin 2013 - Euroforum - Moving an Entire Physical Data Center in...AWS Summit Berlin 2013 - Euroforum - Moving an Entire Physical Data Center in...
AWS Summit Berlin 2013 - Euroforum - Moving an Entire Physical Data Center in...AWS Germany
 
Architecting for AWS Cloud - let's do it right!
Architecting for AWS Cloud - let's do it right!Architecting for AWS Cloud - let's do it right!
Architecting for AWS Cloud - let's do it right!Misha Hanin
 
Cloud comparison - AWS vs Azure vs Google
Cloud comparison - AWS vs Azure vs GoogleCloud comparison - AWS vs Azure vs Google
Cloud comparison - AWS vs Azure vs GooglePatrick Pierson
 
AWS EMR (Elastic Map Reduce) explained
AWS EMR (Elastic Map Reduce) explainedAWS EMR (Elastic Map Reduce) explained
AWS EMR (Elastic Map Reduce) explainedHarsha KM
 
Workload-Aware: Auto-Scaling A new paradigm for Big Data Workloads
Workload-Aware: Auto-Scaling A new paradigm for Big Data WorkloadsWorkload-Aware: Auto-Scaling A new paradigm for Big Data Workloads
Workload-Aware: Auto-Scaling A new paradigm for Big Data WorkloadsVasu S
 
Hadoop in the cloud with AWS' EMR
Hadoop in the cloud with AWS' EMRHadoop in the cloud with AWS' EMR
Hadoop in the cloud with AWS' EMRrICh morrow
 
Deep Dive On Object Storage: Amazon S3 and Amazon Glacier - AWS PS Summit Can...
Deep Dive On Object Storage: Amazon S3 and Amazon Glacier - AWS PS Summit Can...Deep Dive On Object Storage: Amazon S3 and Amazon Glacier - AWS PS Summit Can...
Deep Dive On Object Storage: Amazon S3 and Amazon Glacier - AWS PS Summit Can...Amazon Web Services
 
What's New with Big Data Analytics
What's New with Big Data AnalyticsWhat's New with Big Data Analytics
What's New with Big Data AnalyticsAmazon Web Services
 
Cost Optimising Your Architecture Practical Design Steps for Developer Saving...
Cost Optimising Your Architecture Practical Design Steps for Developer Saving...Cost Optimising Your Architecture Practical Design Steps for Developer Saving...
Cost Optimising Your Architecture Practical Design Steps for Developer Saving...Amazon Web Services
 
AWS re:Invent 2016: Automating Workflows for Analytics Pipelines (DEV401)
AWS re:Invent 2016: Automating Workflows for Analytics Pipelines (DEV401)AWS re:Invent 2016: Automating Workflows for Analytics Pipelines (DEV401)
AWS re:Invent 2016: Automating Workflows for Analytics Pipelines (DEV401)Amazon Web Services
 
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMRBDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMRAmazon Web Services
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - DatalakeLam Le
 
Session Sponsored by Tableau: Transforming Data Into Valuable Insights
Session Sponsored by Tableau: Transforming Data Into Valuable InsightsSession Sponsored by Tableau: Transforming Data Into Valuable Insights
Session Sponsored by Tableau: Transforming Data Into Valuable InsightsAmazon Web Services
 
AWS Summit Berlin 2013 - Optimizing your AWS applications and usage to reduce...
AWS Summit Berlin 2013 - Optimizing your AWS applications and usage to reduce...AWS Summit Berlin 2013 - Optimizing your AWS applications and usage to reduce...
AWS Summit Berlin 2013 - Optimizing your AWS applications and usage to reduce...AWS Germany
 
AWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data AnalyticsAWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data AnalyticsAWS Germany
 
Getting Started with Amazon EMR
Getting Started with Amazon EMRGetting Started with Amazon EMR
Getting Started with Amazon EMRArman Iman
 
Intro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS CloudIntro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS CloudAmazon Web Services
 
Building a Bigdata Architecture on AWS
Building a Bigdata Architecture on AWSBuilding a Bigdata Architecture on AWS
Building a Bigdata Architecture on AWSArun Sirimalla
 

What's hot (20)

Scaling web application in the Cloud
Scaling web application in the CloudScaling web application in the Cloud
Scaling web application in the Cloud
 
AWS-Enabled Disaster Recovery and Business Continuity for SIFIs
AWS-Enabled Disaster Recovery and Business Continuity for SIFIsAWS-Enabled Disaster Recovery and Business Continuity for SIFIs
AWS-Enabled Disaster Recovery and Business Continuity for SIFIs
 
AWS Summit Berlin 2013 - Euroforum - Moving an Entire Physical Data Center in...
AWS Summit Berlin 2013 - Euroforum - Moving an Entire Physical Data Center in...AWS Summit Berlin 2013 - Euroforum - Moving an Entire Physical Data Center in...
AWS Summit Berlin 2013 - Euroforum - Moving an Entire Physical Data Center in...
 
Architecting for AWS Cloud - let's do it right!
Architecting for AWS Cloud - let's do it right!Architecting for AWS Cloud - let's do it right!
Architecting for AWS Cloud - let's do it right!
 
Cloud comparison - AWS vs Azure vs Google
Cloud comparison - AWS vs Azure vs GoogleCloud comparison - AWS vs Azure vs Google
Cloud comparison - AWS vs Azure vs Google
 
AWS EMR (Elastic Map Reduce) explained
AWS EMR (Elastic Map Reduce) explainedAWS EMR (Elastic Map Reduce) explained
AWS EMR (Elastic Map Reduce) explained
 
Workload-Aware: Auto-Scaling A new paradigm for Big Data Workloads
Workload-Aware: Auto-Scaling A new paradigm for Big Data WorkloadsWorkload-Aware: Auto-Scaling A new paradigm for Big Data Workloads
Workload-Aware: Auto-Scaling A new paradigm for Big Data Workloads
 
Hadoop in the cloud with AWS' EMR
Hadoop in the cloud with AWS' EMRHadoop in the cloud with AWS' EMR
Hadoop in the cloud with AWS' EMR
 
Deep Dive On Object Storage: Amazon S3 and Amazon Glacier - AWS PS Summit Can...
Deep Dive On Object Storage: Amazon S3 and Amazon Glacier - AWS PS Summit Can...Deep Dive On Object Storage: Amazon S3 and Amazon Glacier - AWS PS Summit Can...
Deep Dive On Object Storage: Amazon S3 and Amazon Glacier - AWS PS Summit Can...
 
What's New with Big Data Analytics
What's New with Big Data AnalyticsWhat's New with Big Data Analytics
What's New with Big Data Analytics
 
Cost Optimising Your Architecture Practical Design Steps for Developer Saving...
Cost Optimising Your Architecture Practical Design Steps for Developer Saving...Cost Optimising Your Architecture Practical Design Steps for Developer Saving...
Cost Optimising Your Architecture Practical Design Steps for Developer Saving...
 
AWS re:Invent 2016: Automating Workflows for Analytics Pipelines (DEV401)
AWS re:Invent 2016: Automating Workflows for Analytics Pipelines (DEV401)AWS re:Invent 2016: Automating Workflows for Analytics Pipelines (DEV401)
AWS re:Invent 2016: Automating Workflows for Analytics Pipelines (DEV401)
 
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMRBDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - Datalake
 
Session Sponsored by Tableau: Transforming Data Into Valuable Insights
Session Sponsored by Tableau: Transforming Data Into Valuable InsightsSession Sponsored by Tableau: Transforming Data Into Valuable Insights
Session Sponsored by Tableau: Transforming Data Into Valuable Insights
 
AWS Summit Berlin 2013 - Optimizing your AWS applications and usage to reduce...
AWS Summit Berlin 2013 - Optimizing your AWS applications and usage to reduce...AWS Summit Berlin 2013 - Optimizing your AWS applications and usage to reduce...
AWS Summit Berlin 2013 - Optimizing your AWS applications and usage to reduce...
 
AWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data AnalyticsAWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data Analytics
 
Getting Started with Amazon EMR
Getting Started with Amazon EMRGetting Started with Amazon EMR
Getting Started with Amazon EMR
 
Intro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS CloudIntro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS Cloud
 
Building a Bigdata Architecture on AWS
Building a Bigdata Architecture on AWSBuilding a Bigdata Architecture on AWS
Building a Bigdata Architecture on AWS
 

Similar to AWS Summit 2018 Summary

Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...Jamie Kinney
 
Architecting Enterprise Applications In The Cloud
Architecting Enterprise Applications In The CloudArchitecting Enterprise Applications In The Cloud
Architecting Enterprise Applications In The CloudAmazon Web Services
 
BlueData EPIC on AWS - Spec Sheet
BlueData EPIC on AWS - Spec SheetBlueData EPIC on AWS - Spec Sheet
BlueData EPIC on AWS - Spec SheetBlueData, Inc.
 
Data Analysis - Journey Through the Cloud
Data Analysis - Journey Through the CloudData Analysis - Journey Through the Cloud
Data Analysis - Journey Through the CloudIan Massingham
 
Journey Through the Cloud - Data Analysis
Journey Through the Cloud - Data AnalysisJourney Through the Cloud - Data Analysis
Journey Through the Cloud - Data AnalysisAmazon Web Services
 
Windows Azure - Uma Plataforma para o Desenvolvimento de Aplicações
Windows Azure - Uma Plataforma para o Desenvolvimento de AplicaçõesWindows Azure - Uma Plataforma para o Desenvolvimento de Aplicações
Windows Azure - Uma Plataforma para o Desenvolvimento de AplicaçõesComunidade NetPonto
 
How a National Transportation Software Provider Migrated a Mission-Critical T...
How a National Transportation Software Provider Migrated a Mission-Critical T...How a National Transportation Software Provider Migrated a Mission-Critical T...
How a National Transportation Software Provider Migrated a Mission-Critical T...Amazon Web Services
 
AWS Webcast - Best Practices in Architecting for the Cloud
AWS Webcast - Best Practices in Architecting for the CloudAWS Webcast - Best Practices in Architecting for the Cloud
AWS Webcast - Best Practices in Architecting for the CloudAmazon Web Services
 
The IoT Academy_awstraining_part2_aws_ec2_iaas
The IoT Academy_awstraining_part2_aws_ec2_iaasThe IoT Academy_awstraining_part2_aws_ec2_iaas
The IoT Academy_awstraining_part2_aws_ec2_iaasThe IOT Academy
 
Introduction To Cloud Computing And The Amazon (CloudCamp Athens)
Introduction To Cloud Computing And The Amazon (CloudCamp Athens)Introduction To Cloud Computing And The Amazon (CloudCamp Athens)
Introduction To Cloud Computing And The Amazon (CloudCamp Athens)Fotis Stamatelopoulos
 
Everything comes in 3's
Everything comes in 3'sEverything comes in 3's
Everything comes in 3'sdelagoya
 
AWS Primer and Quickstart
AWS Primer and QuickstartAWS Primer and Quickstart
AWS Primer and QuickstartManish Pandit
 
Best Practices for Protecting Cloud Workloads - November 2016 Webinar Series
Best Practices for Protecting Cloud Workloads - November 2016 Webinar SeriesBest Practices for Protecting Cloud Workloads - November 2016 Webinar Series
Best Practices for Protecting Cloud Workloads - November 2016 Webinar SeriesAmazon Web Services
 

Similar to AWS Summit 2018 Summary (20)

AWS Architecting In The Cloud
AWS Architecting In The CloudAWS Architecting In The Cloud
AWS Architecting In The Cloud
 
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
 
Aws coi7
Aws coi7Aws coi7
Aws coi7
 
Architecting Enterprise Applications In The Cloud
Architecting Enterprise Applications In The CloudArchitecting Enterprise Applications In The Cloud
Architecting Enterprise Applications In The Cloud
 
BlueData EPIC on AWS - Spec Sheet
BlueData EPIC on AWS - Spec SheetBlueData EPIC on AWS - Spec Sheet
BlueData EPIC on AWS - Spec Sheet
 
Building Enterprise Cloud Apps
Building Enterprise Cloud AppsBuilding Enterprise Cloud Apps
Building Enterprise Cloud Apps
 
Data Analysis - Journey Through the Cloud
Data Analysis - Journey Through the CloudData Analysis - Journey Through the Cloud
Data Analysis - Journey Through the Cloud
 
Journey Through the Cloud - Data Analysis
Journey Through the Cloud - Data AnalysisJourney Through the Cloud - Data Analysis
Journey Through the Cloud - Data Analysis
 
Windows Azure - Uma Plataforma para o Desenvolvimento de Aplicações
Windows Azure - Uma Plataforma para o Desenvolvimento de AplicaçõesWindows Azure - Uma Plataforma para o Desenvolvimento de Aplicações
Windows Azure - Uma Plataforma para o Desenvolvimento de Aplicações
 
AWS 資料湖服務
AWS 資料湖服務AWS 資料湖服務
AWS 資料湖服務
 
How a National Transportation Software Provider Migrated a Mission-Critical T...
How a National Transportation Software Provider Migrated a Mission-Critical T...How a National Transportation Software Provider Migrated a Mission-Critical T...
How a National Transportation Software Provider Migrated a Mission-Critical T...
 
AWS Webcast - Best Practices in Architecting for the Cloud
AWS Webcast - Best Practices in Architecting for the CloudAWS Webcast - Best Practices in Architecting for the Cloud
AWS Webcast - Best Practices in Architecting for the Cloud
 
Amazon Web Service.pdf
Amazon Web Service.pdfAmazon Web Service.pdf
Amazon Web Service.pdf
 
AMAZON CLOUD Course Content
AMAZON CLOUD Course ContentAMAZON CLOUD Course Content
AMAZON CLOUD Course Content
 
AWS Big Data Solution Days
AWS Big Data Solution DaysAWS Big Data Solution Days
AWS Big Data Solution Days
 
The IoT Academy_awstraining_part2_aws_ec2_iaas
The IoT Academy_awstraining_part2_aws_ec2_iaasThe IoT Academy_awstraining_part2_aws_ec2_iaas
The IoT Academy_awstraining_part2_aws_ec2_iaas
 
Introduction To Cloud Computing And The Amazon (CloudCamp Athens)
Introduction To Cloud Computing And The Amazon (CloudCamp Athens)Introduction To Cloud Computing And The Amazon (CloudCamp Athens)
Introduction To Cloud Computing And The Amazon (CloudCamp Athens)
 
Everything comes in 3's
Everything comes in 3'sEverything comes in 3's
Everything comes in 3's
 
AWS Primer and Quickstart
AWS Primer and QuickstartAWS Primer and Quickstart
AWS Primer and Quickstart
 
Best Practices for Protecting Cloud Workloads - November 2016 Webinar Series
Best Practices for Protecting Cloud Workloads - November 2016 Webinar SeriesBest Practices for Protecting Cloud Workloads - November 2016 Webinar Series
Best Practices for Protecting Cloud Workloads - November 2016 Webinar Series
 

Recently uploaded

Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 

Recently uploaded (20)

Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 

AWS Summit 2018 Summary

  • 1. AWS Summit Knowledge Share - ASHISH MRIG HTTPS://WWW.LINKEDIN.COM/IN/ASHISHMRIG/
  • 2. AWS Athena: New Features New JDBC/ODBC drivers released which are 2-5x faster Supports CTAS The drivers integrate with MS Active Directory for access control (in lieu of access keys) Supports Views Introduced Athena Work Groups (in beta)
  • 3. Athena Work Groups Can be defined to identify different types of workload or teams Integrated with cloud watch and allows collection of metrics at work groups level Cost control: Can set query threshold limit for each work group on usage (GB) or time Can create and send trigger alerts at the breach of threshold; option to fail the query as well Option to disable workgroups
  • 4.
  • 5. New Features in AWS Features Description EMR Notebooks EMR notebook is a "serverless" Jupyter notebook executed using EMR cluster and managed via EMR console. Ability to attach notebook to any EMR cluster, notebook stored in S3 AWS Glue Support for Hive, Spark & Presto S3 Select Available in Spark, Java, Python. Objects must be in CSV, JSON, or Parquet with UTF-8 encoding AWS Textract (in beta) Automatically extracts text and data from scanned documents including data stored in tables & forms Predictive Scaling ML based feature which will try to predict the workload and scale accordingly Serverless Aurora (in beta) Serverless offering of Aurora similar to Athena
  • 6. AWS Glue Crawlers for automatic data discovery. Auto generate schema & partitions Generates Python/Scala code which can be customized Job bookmarks – keep track of data that is already processed Has built-in in Scheduler and integrated with AWS CloudWatch for notification Catalog can be shared Athena & Redshift Spectrum Fine grained catalog permission at Table or Connection
  • 7. AWS Auto Scaling Free service to scale EC2/ECS/Aurora/Dynamo Scaling can be defined on any CloudWatch metric including custom metric Typical metrics used: CPU Utilization, memory, incoming traffic Scaling Options: Manual/Scheduled/Dynamic New option added: Predictive Scaling Uses ML and based on historical patterns of resource utilization Daily forecasting at 60 minute grain
  • 8. Data Lake Architecture: Best Practices Build decouple systems: future proof Design for ability to scale indefinitely with new business Focus on core competencies, reduce dependencies on managing or building infrastructure. Be very cost conscious, build ‘pay-for-what-you-use’ architecture Enable your application to leverage ML
  • 9. Data Lake Architecture: Best Practices Design and build for multi-tenancy Consolidate small files before loading into S3 For full data scan use AVRO file type Always preserve the raw data in IA or Glacier Use automated test suites on every release/commit
  • 10. Best Practices for Data Lake Security Encrypt data at rest (KMS) and in transit (SSL) Set ownership of S3 buckets at user/team level. It reduces surface area of attack Disable S3 delete using IAM roles Buckets should always be created on business domain and should have security policies baked in Backup data across regions Allow S3 access based on tags (eg – redshift, HIPPA query) or IAM roles (dev/DS)
  • 11. Best Practices for Data Lake Security Use AWS Config to detect & notify on any S3 policy changes Use AWS Macie to detect & classify PII and sensitive data Control data access through views, don’t expose the core tables directly Use & leverage centralized data catalogs
  • 12. EMR Best Practices Run Stateless: the Meta store should be remote (MySQl or Glue) Use combination of spot instances to reduce cost (design for re-runnability). Don’t specify the Availability Zone to get cheapest instances Single spot node termination will not interrupt the cluster (new feature: graceful decommissioning ) Build Instance fleets with mix of different instance types (c5/r5..) and different markets (spot/on-demand)
  • 13. Choosing the Right DB Days of one-size-fit-all DBs are over DBs have become specialized based on their use case, different types: Eg - Better (faster & cheaper) to use time-series for storing & plotting time base data compared to RDBMS. Relational Key-Value MPP In-Memory Document Graph Time-Series Ledger Columnar Distributed Object
  • 14. AWS Quantum Ledger DB (QLDB) Keeps track of transparent, immutable, and cryptographically verifiable transaction log data over distributed ledgers Every entry is written into a journal and cannot be changed. Journal is append only and maintains two states: Current & History Each txn generates a digest using a cryptographic hash function (SHA-256) which guarantees the integrity Serverless , SQL support & ACID compliant
  • 15. Machine Learning Services 1. AWS SageMaker Full managed service to build/train/deploy the machine learning models. Out of the box optimization for following ML packages: Supports: Supervised, Unsupervised & Reinforced learning  Supported by EC P3dn.24xlarge (8 Tesla V100 GPUs) TensorFlow, Apache MXNet, PyTorch, Chainer, Scikit-learn, SparkML, Horovod, Keras, and Gluon
  • 16. Machine Learning Services (contd..) Framework & model agnostic (use from pre- trained model library or bring your own) Integrated with Jupyter notebook On demand and scalable training clusters Integrated with different AWS services like Lambda, API Gateway, Cloudwatch etc
  • 17. Machine Learning Services (contd..) 2. AWS SageMaker Ground Truth Most of ML work is spent in labeling the training data This package can help significantly reduce the time and effort required to create datasets for training & reduce costs 3. AWS SageMaker Neo Container to deploy the ML on any hardware or application
  • 18. AWS Data Services Service Description Data Migration Service Transfers data to AWS cloud; supports homogenous migrations such as Oracle to Oracle, and heterogeneous migrations such as Oracle to Aurora AWS Macie Amazon Macie is a security service that uses machine learning to automatically discover, classify, and protect sensitive data in AWS such as PII, IP. It provides you with dashboards and alerts AWS Direct Connect It lets you establish a dedicated network connection between your network and one of the AWS facility. It saves on bandwidth cost and provides consistent network performance AWS Snowball An 80 TB physical device shipped to client location, where the data is copied and is shipped back to AWS facility for data copy into AWS. Good for petabyte scale data copy to save money on transfer cost
  • 19. Netflix Push Messaging Case Study Netflix had Polling infrastructure to poll all it’s client interfaces Polling is inherently inefficient, they were able to reduce the web traffic by 12% by switching to Push They have Open Sourced the complete Push Messaging Framework: Zuul Available on GitHub: https://github.com/Netflix/zuul/wiki/Push- Messaging
  • 20. Netflix Case Study Interesting Challenges Zuul uses persistent connection to make stateful, this makes deployments difficult Solved by using Cluster swap, however this created problem of ‘Thundering herd’ (everyone trying to connect at same time) Resolved by introducing connection lifetime (~30 min) and randomizing connection lifetime Zuul push cluster can auto-scale based on number of open connection AWS allows auto-scaling based on any metric defined in Cloud Watch
  • 21. AWS Forecast (in beta) Predicts future points in a time series given historical data Uses deep learning models developed by Amazon Accuracy is cornerstone of any forecasting; this service is 50% more accurate than traditional methods It comes with 8 pre-packaged models: 5 custom built algorithm and 3 traditional for benchmarking Inputs: Historical Time-Series (eg- electricity consumption per year), Any Related data (weather), Metadata (location) Output: Ability to visualize the forecast and export via an API All forecasting is probabilistic for a specific prediction interval including margin or error
  • 22. AWS Forecast Models Traditional Amazon Pricing Exponential Smooting ARIMA Prophet Auto Regressive LSTM Spline Quantile Forecaster Multi Horizon Quantile (MQ- RNN) Cost Type Pricing Generated forecasts $0.60 per 1,000 forecasts Data storage $0.088 per GB Training hours $0.238 per hour
  • 23. New Terms Learnt Term Meaning Dark Data Data that is hidden in files or otherwise not accessible to the enterprise Data Ponds Data that live in Silos across the Enterprise Blast Radius Impact of a deployment, eg – microservice deployment will have a smaller blast radius compared to a monolithic API Thundering Herd When everyone tries to connect at same time, eg – if your service goes down and after it is restored all users try to connect simultaneously overwhelming the system Data Decay Value of data decreases over time, data is most valuable near its creation
  • 24.
  • 25.
  • 26.
  • 27. Accelerate Innovation & Maximize Business Value w/ Serverless Apps
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33. More Info AWS Slide Deck AWS Videos AWS re:Invent Recap In 2019: Dec 2 – Dec 6 @ Las Vegas

Editor's Notes

  1. CTAS is huge, Work groups are essentially resource queue
  2. CTAS is huge
  3. CTAS is huge
  4. Predictive Scaling needs up to two weeks of historical data 
  5. The key usage is bit
  6. THE HIGHEST PERFORMING GPU INSTANCE in the cloud Reinforced: model learns by interacting real world scenarios
  7. THE HIGHEST PERFORMING GPU INSTANCE in the cloud Reinforced: model learns by interacting real world scenarios
  8. For example, building a computer vision system that is reliable enough to identify objects - such as traffic lights, stop signs, and pedestrians - requires thousands of hours of video recordings that consist of hundreds of millions of video frames. Each one of these frames needs all of the important elements like the road, other cars, and signage to be labeled by a human before any work can begin on the model you want to develop. Amazon SageMaker Ground Truth significantly reduces the time and effort required to create datasets for training to reduce costs. These savings are achieved by using machine learning to automatically label data. The model is able to get progressively better over time by continuously learning from labels created by human labelers.
  9. They also have best fit model, where AWS will choose based on the data