SlideShare a Scribd company logo
AWS Summit
Knowledge Share
- ASHISH MRIG
HTTPS://WWW.LINKEDIN.COM/IN/ASHISHMRIG/
AWS Athena: New Features
New JDBC/ODBC drivers released which are
2-5x faster
Supports CTAS
The drivers integrate with MS Active Directory for
access control (in lieu of access keys)
Supports Views
Introduced Athena Work Groups (in beta)
Athena Work Groups
Can be defined to identify different types of
workload or teams
Integrated with cloud watch and allows collection
of metrics at work groups level
Cost control: Can set query threshold limit for each
work group on usage (GB) or time
Can create and send trigger alerts at the breach of
threshold; option to fail the query as well
Option to disable workgroups
New Features in AWS
Features Description
EMR Notebooks EMR notebook is a "serverless" Jupyter notebook
executed using EMR cluster and managed via EMR
console. Ability to attach notebook to any EMR cluster,
notebook stored in S3
AWS Glue Support for Hive, Spark & Presto
S3 Select Available in Spark, Java, Python. Objects must be in
CSV, JSON, or Parquet with UTF-8 encoding
AWS Textract
(in beta)
Automatically extracts text and data from scanned
documents including data stored in tables & forms
Predictive Scaling ML based feature which will try to predict the workload
and scale accordingly
Serverless Aurora
(in beta)
Serverless offering of Aurora similar to Athena
AWS Glue
Crawlers for automatic data discovery. Auto generate
schema & partitions
Generates Python/Scala code which can be customized
Job bookmarks – keep track of data that is already
processed
Has built-in in Scheduler and integrated with AWS
CloudWatch for notification
Catalog can be shared Athena & Redshift Spectrum
Fine grained catalog permission at Table or Connection
AWS Auto Scaling
Free service to scale EC2/ECS/Aurora/Dynamo
Scaling can be defined on any CloudWatch metric including
custom metric
Typical metrics used: CPU Utilization, memory, incoming
traffic
Scaling Options: Manual/Scheduled/Dynamic
New option added: Predictive Scaling
Uses ML and based on historical patterns of resource
utilization
Daily forecasting at 60 minute grain
Data Lake Architecture:
Best Practices
Build decouple systems: future proof
Design for ability to scale indefinitely with new
business
Focus on core competencies, reduce dependencies on
managing or building infrastructure.
Be very cost conscious, build ‘pay-for-what-you-use’
architecture
Enable your application to leverage ML
Data Lake Architecture:
Best Practices
Design and build for multi-tenancy
Consolidate small files before loading into S3
For full data scan use AVRO file type
Always preserve the raw data in IA or Glacier
Use automated test suites on every
release/commit
Best Practices for
Data Lake Security
Encrypt data at rest (KMS) and in transit (SSL)
Set ownership of S3 buckets at user/team level. It reduces
surface area of attack
Disable S3 delete using IAM roles
Buckets should always be created on business domain and
should have security policies baked in
Backup data across regions
Allow S3 access based on tags (eg – redshift, HIPPA query)
or IAM roles (dev/DS)
Best Practices for
Data Lake Security
Use AWS Config to detect & notify on any S3 policy
changes
Use AWS Macie to detect & classify PII and sensitive
data
Control data access through views, don’t expose the
core tables directly
Use & leverage centralized data catalogs
EMR Best Practices
Run Stateless: the Meta store should be
remote (MySQl or Glue)
Use combination of spot instances to reduce cost
(design for re-runnability).
Don’t specify the Availability Zone to get cheapest
instances
Single spot node termination will not interrupt the
cluster (new feature: graceful decommissioning )
Build Instance fleets with mix of different instance
types (c5/r5..) and different markets (spot/on-demand)
Choosing the Right DB
Days of one-size-fit-all DBs are over
DBs have become specialized based on their use
case, different types:
Eg - Better (faster & cheaper) to use time-series
for storing & plotting time base data compared to
RDBMS.
Relational Key-Value MPP In-Memory
Document Graph Time-Series Ledger
Columnar Distributed Object
AWS Quantum Ledger DB
(QLDB)
Keeps track of transparent, immutable, and
cryptographically verifiable transaction log data over
distributed ledgers
Every entry is written into a journal and cannot be
changed. Journal is append only and maintains two
states: Current & History
Each txn generates a digest using a cryptographic hash
function (SHA-256) which guarantees the integrity
Serverless , SQL support & ACID compliant
Machine Learning
Services
1. AWS SageMaker
Full managed service to build/train/deploy the
machine learning models.
Out of the box optimization for following ML
packages:
Supports: Supervised, Unsupervised & Reinforced
learning
 Supported by EC P3dn.24xlarge (8 Tesla V100 GPUs)
TensorFlow, Apache MXNet, PyTorch, Chainer, Scikit-learn, SparkML,
Horovod, Keras, and Gluon
Machine Learning
Services (contd..)
Framework & model agnostic (use from pre-
trained model library or bring your own)
Integrated with Jupyter notebook
On demand and scalable training clusters
Integrated with different AWS services like
Lambda, API Gateway, Cloudwatch etc
Machine Learning
Services (contd..)
2. AWS SageMaker Ground Truth
Most of ML work is spent in labeling the training
data
This package can help significantly reduce the
time and effort required to create datasets for
training & reduce costs
3. AWS SageMaker Neo
Container to deploy the ML on any hardware or
application
AWS Data Services
Service Description
Data Migration Service Transfers data to AWS cloud; supports homogenous
migrations such as Oracle to Oracle, and
heterogeneous migrations such as Oracle to Aurora
AWS Macie Amazon Macie is a security service that uses
machine learning to automatically discover, classify,
and protect sensitive data in AWS such as PII, IP. It
provides you with dashboards and alerts
AWS Direct Connect It lets you establish a dedicated network
connection between your network and one of the
AWS facility. It saves on bandwidth cost and
provides consistent network performance
AWS Snowball An 80 TB physical device shipped to client location,
where the data is copied and is shipped back to
AWS facility for data copy into AWS. Good for
petabyte scale data copy to save money on transfer
cost
Netflix Push Messaging
Case Study
Netflix had Polling infrastructure to poll all it’s
client interfaces
Polling is inherently inefficient, they were able to
reduce the web traffic by 12% by switching to Push
They have Open Sourced the complete Push
Messaging Framework: Zuul
Available on GitHub:
https://github.com/Netflix/zuul/wiki/Push-
Messaging
Netflix Case Study
Interesting Challenges
Zuul uses persistent connection to make stateful, this
makes deployments difficult
Solved by using Cluster swap, however this created
problem of ‘Thundering herd’ (everyone trying to connect at
same time)
Resolved by introducing connection lifetime (~30 min) and
randomizing connection lifetime
Zuul push cluster can auto-scale based on number of open
connection
AWS allows auto-scaling based on any metric defined in
Cloud Watch
AWS Forecast
(in beta)
Predicts future points in a time series given historical data
Uses deep learning models developed by Amazon
Accuracy is cornerstone of any forecasting; this service is
50% more accurate than traditional methods
It comes with 8 pre-packaged models: 5 custom built
algorithm and 3 traditional for benchmarking
Inputs: Historical Time-Series (eg- electricity consumption
per year), Any Related data (weather), Metadata (location)
Output: Ability to visualize the forecast and export via an
API
All forecasting is probabilistic for a specific prediction
interval including margin or error
AWS Forecast
Models
Traditional Amazon
Pricing
Exponential Smooting
ARIMA
Prophet
Auto Regressive LSTM
Spline Quantile Forecaster
Multi Horizon Quantile (MQ-
RNN)
Cost Type Pricing
Generated forecasts $0.60 per 1,000 forecasts
Data storage $0.088 per GB
Training hours $0.238 per hour
New Terms Learnt
Term Meaning
Dark Data Data that is hidden in files or otherwise not accessible
to the enterprise
Data Ponds Data that live in Silos across the Enterprise
Blast Radius Impact of a deployment, eg – microservice deployment
will have a smaller blast radius compared to a
monolithic API
Thundering Herd When everyone tries to connect at same time, eg – if
your service goes down and after it is restored all users
try to connect simultaneously overwhelming the system
Data Decay Value of data decreases over time, data is most valuable
near its creation
Accelerate Innovation & Maximize Business Value w/
Serverless Apps
More Info
AWS Slide Deck
AWS Videos
AWS re:Invent Recap
In 2019: Dec 2 – Dec 6 @ Las Vegas

More Related Content

What's hot

Scaling web application in the Cloud
Scaling web application in the CloudScaling web application in the Cloud
Scaling web application in the Cloud
Federico Feroldi
 
AWS-Enabled Disaster Recovery and Business Continuity for SIFIs
AWS-Enabled Disaster Recovery and Business Continuity for SIFIsAWS-Enabled Disaster Recovery and Business Continuity for SIFIs
AWS-Enabled Disaster Recovery and Business Continuity for SIFIs
Amazon Web Services
 
AWS Summit Berlin 2013 - Euroforum - Moving an Entire Physical Data Center in...
AWS Summit Berlin 2013 - Euroforum - Moving an Entire Physical Data Center in...AWS Summit Berlin 2013 - Euroforum - Moving an Entire Physical Data Center in...
AWS Summit Berlin 2013 - Euroforum - Moving an Entire Physical Data Center in...
AWS Germany
 
Architecting for AWS Cloud - let's do it right!
Architecting for AWS Cloud - let's do it right!Architecting for AWS Cloud - let's do it right!
Architecting for AWS Cloud - let's do it right!
Misha Hanin
 
Cloud comparison - AWS vs Azure vs Google
Cloud comparison - AWS vs Azure vs GoogleCloud comparison - AWS vs Azure vs Google
Cloud comparison - AWS vs Azure vs Google
Patrick Pierson
 
AWS EMR (Elastic Map Reduce) explained
AWS EMR (Elastic Map Reduce) explainedAWS EMR (Elastic Map Reduce) explained
AWS EMR (Elastic Map Reduce) explained
Harsha KM
 
Workload-Aware: Auto-Scaling A new paradigm for Big Data Workloads
Workload-Aware: Auto-Scaling A new paradigm for Big Data WorkloadsWorkload-Aware: Auto-Scaling A new paradigm for Big Data Workloads
Workload-Aware: Auto-Scaling A new paradigm for Big Data Workloads
Vasu S
 
Hadoop in the cloud with AWS' EMR
Hadoop in the cloud with AWS' EMRHadoop in the cloud with AWS' EMR
Hadoop in the cloud with AWS' EMR
rICh morrow
 
Deep Dive On Object Storage: Amazon S3 and Amazon Glacier - AWS PS Summit Can...
Deep Dive On Object Storage: Amazon S3 and Amazon Glacier - AWS PS Summit Can...Deep Dive On Object Storage: Amazon S3 and Amazon Glacier - AWS PS Summit Can...
Deep Dive On Object Storage: Amazon S3 and Amazon Glacier - AWS PS Summit Can...
Amazon Web Services
 
What's New with Big Data Analytics
What's New with Big Data AnalyticsWhat's New with Big Data Analytics
What's New with Big Data Analytics
Amazon Web Services
 
Cost Optimising Your Architecture Practical Design Steps for Developer Saving...
Cost Optimising Your Architecture Practical Design Steps for Developer Saving...Cost Optimising Your Architecture Practical Design Steps for Developer Saving...
Cost Optimising Your Architecture Practical Design Steps for Developer Saving...
Amazon Web Services
 
AWS re:Invent 2016: Automating Workflows for Analytics Pipelines (DEV401)
AWS re:Invent 2016: Automating Workflows for Analytics Pipelines (DEV401)AWS re:Invent 2016: Automating Workflows for Analytics Pipelines (DEV401)
AWS re:Invent 2016: Automating Workflows for Analytics Pipelines (DEV401)
Amazon Web Services
 
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMRBDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
Amazon Web Services
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - Datalake
Lam Le
 
Session Sponsored by Tableau: Transforming Data Into Valuable Insights
Session Sponsored by Tableau: Transforming Data Into Valuable InsightsSession Sponsored by Tableau: Transforming Data Into Valuable Insights
Session Sponsored by Tableau: Transforming Data Into Valuable Insights
Amazon Web Services
 
AWS Summit Berlin 2013 - Optimizing your AWS applications and usage to reduce...
AWS Summit Berlin 2013 - Optimizing your AWS applications and usage to reduce...AWS Summit Berlin 2013 - Optimizing your AWS applications and usage to reduce...
AWS Summit Berlin 2013 - Optimizing your AWS applications and usage to reduce...
AWS Germany
 
AWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data AnalyticsAWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data Analytics
AWS Germany
 
Getting Started with Amazon EMR
Getting Started with Amazon EMRGetting Started with Amazon EMR
Getting Started with Amazon EMR
Arman Iman
 
Intro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS CloudIntro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS Cloud
Amazon Web Services
 
Building a Bigdata Architecture on AWS
Building a Bigdata Architecture on AWSBuilding a Bigdata Architecture on AWS
Building a Bigdata Architecture on AWS
Arun Sirimalla
 

What's hot (20)

Scaling web application in the Cloud
Scaling web application in the CloudScaling web application in the Cloud
Scaling web application in the Cloud
 
AWS-Enabled Disaster Recovery and Business Continuity for SIFIs
AWS-Enabled Disaster Recovery and Business Continuity for SIFIsAWS-Enabled Disaster Recovery and Business Continuity for SIFIs
AWS-Enabled Disaster Recovery and Business Continuity for SIFIs
 
AWS Summit Berlin 2013 - Euroforum - Moving an Entire Physical Data Center in...
AWS Summit Berlin 2013 - Euroforum - Moving an Entire Physical Data Center in...AWS Summit Berlin 2013 - Euroforum - Moving an Entire Physical Data Center in...
AWS Summit Berlin 2013 - Euroforum - Moving an Entire Physical Data Center in...
 
Architecting for AWS Cloud - let's do it right!
Architecting for AWS Cloud - let's do it right!Architecting for AWS Cloud - let's do it right!
Architecting for AWS Cloud - let's do it right!
 
Cloud comparison - AWS vs Azure vs Google
Cloud comparison - AWS vs Azure vs GoogleCloud comparison - AWS vs Azure vs Google
Cloud comparison - AWS vs Azure vs Google
 
AWS EMR (Elastic Map Reduce) explained
AWS EMR (Elastic Map Reduce) explainedAWS EMR (Elastic Map Reduce) explained
AWS EMR (Elastic Map Reduce) explained
 
Workload-Aware: Auto-Scaling A new paradigm for Big Data Workloads
Workload-Aware: Auto-Scaling A new paradigm for Big Data WorkloadsWorkload-Aware: Auto-Scaling A new paradigm for Big Data Workloads
Workload-Aware: Auto-Scaling A new paradigm for Big Data Workloads
 
Hadoop in the cloud with AWS' EMR
Hadoop in the cloud with AWS' EMRHadoop in the cloud with AWS' EMR
Hadoop in the cloud with AWS' EMR
 
Deep Dive On Object Storage: Amazon S3 and Amazon Glacier - AWS PS Summit Can...
Deep Dive On Object Storage: Amazon S3 and Amazon Glacier - AWS PS Summit Can...Deep Dive On Object Storage: Amazon S3 and Amazon Glacier - AWS PS Summit Can...
Deep Dive On Object Storage: Amazon S3 and Amazon Glacier - AWS PS Summit Can...
 
What's New with Big Data Analytics
What's New with Big Data AnalyticsWhat's New with Big Data Analytics
What's New with Big Data Analytics
 
Cost Optimising Your Architecture Practical Design Steps for Developer Saving...
Cost Optimising Your Architecture Practical Design Steps for Developer Saving...Cost Optimising Your Architecture Practical Design Steps for Developer Saving...
Cost Optimising Your Architecture Practical Design Steps for Developer Saving...
 
AWS re:Invent 2016: Automating Workflows for Analytics Pipelines (DEV401)
AWS re:Invent 2016: Automating Workflows for Analytics Pipelines (DEV401)AWS re:Invent 2016: Automating Workflows for Analytics Pipelines (DEV401)
AWS re:Invent 2016: Automating Workflows for Analytics Pipelines (DEV401)
 
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMRBDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - Datalake
 
Session Sponsored by Tableau: Transforming Data Into Valuable Insights
Session Sponsored by Tableau: Transforming Data Into Valuable InsightsSession Sponsored by Tableau: Transforming Data Into Valuable Insights
Session Sponsored by Tableau: Transforming Data Into Valuable Insights
 
AWS Summit Berlin 2013 - Optimizing your AWS applications and usage to reduce...
AWS Summit Berlin 2013 - Optimizing your AWS applications and usage to reduce...AWS Summit Berlin 2013 - Optimizing your AWS applications and usage to reduce...
AWS Summit Berlin 2013 - Optimizing your AWS applications and usage to reduce...
 
AWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data AnalyticsAWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data Analytics
 
Getting Started with Amazon EMR
Getting Started with Amazon EMRGetting Started with Amazon EMR
Getting Started with Amazon EMR
 
Intro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS CloudIntro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS Cloud
 
Building a Bigdata Architecture on AWS
Building a Bigdata Architecture on AWSBuilding a Bigdata Architecture on AWS
Building a Bigdata Architecture on AWS
 

Similar to AWS Summit 2018 Summary

AWS Architecting In The Cloud
AWS Architecting In The CloudAWS Architecting In The Cloud
AWS Architecting In The Cloud
Amazon Web Services
 
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
Jamie Kinney
 
Aws coi7
Aws coi7Aws coi7
Aws coi7
Jeevan Dongre
 
Architecting Enterprise Applications In The Cloud
Architecting Enterprise Applications In The CloudArchitecting Enterprise Applications In The Cloud
Architecting Enterprise Applications In The Cloud
Amazon Web Services
 
BlueData EPIC on AWS - Spec Sheet
BlueData EPIC on AWS - Spec SheetBlueData EPIC on AWS - Spec Sheet
BlueData EPIC on AWS - Spec Sheet
BlueData, Inc.
 
Building Enterprise Cloud Apps
Building Enterprise Cloud AppsBuilding Enterprise Cloud Apps
Building Enterprise Cloud Apps
Amazon Web Services
 
Journey Through the Cloud - Data Analysis
Journey Through the Cloud - Data AnalysisJourney Through the Cloud - Data Analysis
Journey Through the Cloud - Data Analysis
Amazon Web Services
 
Data Analysis - Journey Through the Cloud
Data Analysis - Journey Through the CloudData Analysis - Journey Through the Cloud
Data Analysis - Journey Through the Cloud
Ian Massingham
 
Windows Azure - Uma Plataforma para o Desenvolvimento de Aplicações
Windows Azure - Uma Plataforma para o Desenvolvimento de AplicaçõesWindows Azure - Uma Plataforma para o Desenvolvimento de Aplicações
Windows Azure - Uma Plataforma para o Desenvolvimento de Aplicações
Comunidade NetPonto
 
AWS 資料湖服務
AWS 資料湖服務AWS 資料湖服務
AWS 資料湖服務
Amazon Web Services
 
How a National Transportation Software Provider Migrated a Mission-Critical T...
How a National Transportation Software Provider Migrated a Mission-Critical T...How a National Transportation Software Provider Migrated a Mission-Critical T...
How a National Transportation Software Provider Migrated a Mission-Critical T...
Amazon Web Services
 
AWS Webcast - Best Practices in Architecting for the Cloud
AWS Webcast - Best Practices in Architecting for the CloudAWS Webcast - Best Practices in Architecting for the Cloud
AWS Webcast - Best Practices in Architecting for the Cloud
Amazon Web Services
 
Amazon Web Service.pdf
Amazon Web Service.pdfAmazon Web Service.pdf
Amazon Web Service.pdf
Pyingkodi Maran
 
AMAZON CLOUD Course Content
AMAZON CLOUD Course ContentAMAZON CLOUD Course Content
AMAZON CLOUD Course Content
Varnaaz Technologies
 
AWS Big Data Solution Days
AWS Big Data Solution DaysAWS Big Data Solution Days
AWS Big Data Solution Days
Amazon Web Services
 
The IoT Academy_awstraining_part2_aws_ec2_iaas
The IoT Academy_awstraining_part2_aws_ec2_iaasThe IoT Academy_awstraining_part2_aws_ec2_iaas
The IoT Academy_awstraining_part2_aws_ec2_iaas
The IOT Academy
 
Introduction To Cloud Computing And The Amazon (CloudCamp Athens)
Introduction To Cloud Computing And The Amazon (CloudCamp Athens)Introduction To Cloud Computing And The Amazon (CloudCamp Athens)
Introduction To Cloud Computing And The Amazon (CloudCamp Athens)
Fotis Stamatelopoulos
 
Everything comes in 3's
Everything comes in 3'sEverything comes in 3's
Everything comes in 3's
delagoya
 
AWS Primer and Quickstart
AWS Primer and QuickstartAWS Primer and Quickstart
AWS Primer and Quickstart
Manish Pandit
 
Best Practices for Protecting Cloud Workloads - November 2016 Webinar Series
Best Practices for Protecting Cloud Workloads - November 2016 Webinar SeriesBest Practices for Protecting Cloud Workloads - November 2016 Webinar Series
Best Practices for Protecting Cloud Workloads - November 2016 Webinar Series
Amazon Web Services
 

Similar to AWS Summit 2018 Summary (20)

AWS Architecting In The Cloud
AWS Architecting In The CloudAWS Architecting In The Cloud
AWS Architecting In The Cloud
 
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
 
Aws coi7
Aws coi7Aws coi7
Aws coi7
 
Architecting Enterprise Applications In The Cloud
Architecting Enterprise Applications In The CloudArchitecting Enterprise Applications In The Cloud
Architecting Enterprise Applications In The Cloud
 
BlueData EPIC on AWS - Spec Sheet
BlueData EPIC on AWS - Spec SheetBlueData EPIC on AWS - Spec Sheet
BlueData EPIC on AWS - Spec Sheet
 
Building Enterprise Cloud Apps
Building Enterprise Cloud AppsBuilding Enterprise Cloud Apps
Building Enterprise Cloud Apps
 
Journey Through the Cloud - Data Analysis
Journey Through the Cloud - Data AnalysisJourney Through the Cloud - Data Analysis
Journey Through the Cloud - Data Analysis
 
Data Analysis - Journey Through the Cloud
Data Analysis - Journey Through the CloudData Analysis - Journey Through the Cloud
Data Analysis - Journey Through the Cloud
 
Windows Azure - Uma Plataforma para o Desenvolvimento de Aplicações
Windows Azure - Uma Plataforma para o Desenvolvimento de AplicaçõesWindows Azure - Uma Plataforma para o Desenvolvimento de Aplicações
Windows Azure - Uma Plataforma para o Desenvolvimento de Aplicações
 
AWS 資料湖服務
AWS 資料湖服務AWS 資料湖服務
AWS 資料湖服務
 
How a National Transportation Software Provider Migrated a Mission-Critical T...
How a National Transportation Software Provider Migrated a Mission-Critical T...How a National Transportation Software Provider Migrated a Mission-Critical T...
How a National Transportation Software Provider Migrated a Mission-Critical T...
 
AWS Webcast - Best Practices in Architecting for the Cloud
AWS Webcast - Best Practices in Architecting for the CloudAWS Webcast - Best Practices in Architecting for the Cloud
AWS Webcast - Best Practices in Architecting for the Cloud
 
Amazon Web Service.pdf
Amazon Web Service.pdfAmazon Web Service.pdf
Amazon Web Service.pdf
 
AMAZON CLOUD Course Content
AMAZON CLOUD Course ContentAMAZON CLOUD Course Content
AMAZON CLOUD Course Content
 
AWS Big Data Solution Days
AWS Big Data Solution DaysAWS Big Data Solution Days
AWS Big Data Solution Days
 
The IoT Academy_awstraining_part2_aws_ec2_iaas
The IoT Academy_awstraining_part2_aws_ec2_iaasThe IoT Academy_awstraining_part2_aws_ec2_iaas
The IoT Academy_awstraining_part2_aws_ec2_iaas
 
Introduction To Cloud Computing And The Amazon (CloudCamp Athens)
Introduction To Cloud Computing And The Amazon (CloudCamp Athens)Introduction To Cloud Computing And The Amazon (CloudCamp Athens)
Introduction To Cloud Computing And The Amazon (CloudCamp Athens)
 
Everything comes in 3's
Everything comes in 3'sEverything comes in 3's
Everything comes in 3's
 
AWS Primer and Quickstart
AWS Primer and QuickstartAWS Primer and Quickstart
AWS Primer and Quickstart
 
Best Practices for Protecting Cloud Workloads - November 2016 Webinar Series
Best Practices for Protecting Cloud Workloads - November 2016 Webinar SeriesBest Practices for Protecting Cloud Workloads - November 2016 Webinar Series
Best Practices for Protecting Cloud Workloads - November 2016 Webinar Series
 

Recently uploaded

Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
Pixlogix Infotech
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 

Recently uploaded (20)

Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 

AWS Summit 2018 Summary

  • 1. AWS Summit Knowledge Share - ASHISH MRIG HTTPS://WWW.LINKEDIN.COM/IN/ASHISHMRIG/
  • 2. AWS Athena: New Features New JDBC/ODBC drivers released which are 2-5x faster Supports CTAS The drivers integrate with MS Active Directory for access control (in lieu of access keys) Supports Views Introduced Athena Work Groups (in beta)
  • 3. Athena Work Groups Can be defined to identify different types of workload or teams Integrated with cloud watch and allows collection of metrics at work groups level Cost control: Can set query threshold limit for each work group on usage (GB) or time Can create and send trigger alerts at the breach of threshold; option to fail the query as well Option to disable workgroups
  • 4.
  • 5. New Features in AWS Features Description EMR Notebooks EMR notebook is a "serverless" Jupyter notebook executed using EMR cluster and managed via EMR console. Ability to attach notebook to any EMR cluster, notebook stored in S3 AWS Glue Support for Hive, Spark & Presto S3 Select Available in Spark, Java, Python. Objects must be in CSV, JSON, or Parquet with UTF-8 encoding AWS Textract (in beta) Automatically extracts text and data from scanned documents including data stored in tables & forms Predictive Scaling ML based feature which will try to predict the workload and scale accordingly Serverless Aurora (in beta) Serverless offering of Aurora similar to Athena
  • 6. AWS Glue Crawlers for automatic data discovery. Auto generate schema & partitions Generates Python/Scala code which can be customized Job bookmarks – keep track of data that is already processed Has built-in in Scheduler and integrated with AWS CloudWatch for notification Catalog can be shared Athena & Redshift Spectrum Fine grained catalog permission at Table or Connection
  • 7. AWS Auto Scaling Free service to scale EC2/ECS/Aurora/Dynamo Scaling can be defined on any CloudWatch metric including custom metric Typical metrics used: CPU Utilization, memory, incoming traffic Scaling Options: Manual/Scheduled/Dynamic New option added: Predictive Scaling Uses ML and based on historical patterns of resource utilization Daily forecasting at 60 minute grain
  • 8. Data Lake Architecture: Best Practices Build decouple systems: future proof Design for ability to scale indefinitely with new business Focus on core competencies, reduce dependencies on managing or building infrastructure. Be very cost conscious, build ‘pay-for-what-you-use’ architecture Enable your application to leverage ML
  • 9. Data Lake Architecture: Best Practices Design and build for multi-tenancy Consolidate small files before loading into S3 For full data scan use AVRO file type Always preserve the raw data in IA or Glacier Use automated test suites on every release/commit
  • 10. Best Practices for Data Lake Security Encrypt data at rest (KMS) and in transit (SSL) Set ownership of S3 buckets at user/team level. It reduces surface area of attack Disable S3 delete using IAM roles Buckets should always be created on business domain and should have security policies baked in Backup data across regions Allow S3 access based on tags (eg – redshift, HIPPA query) or IAM roles (dev/DS)
  • 11. Best Practices for Data Lake Security Use AWS Config to detect & notify on any S3 policy changes Use AWS Macie to detect & classify PII and sensitive data Control data access through views, don’t expose the core tables directly Use & leverage centralized data catalogs
  • 12. EMR Best Practices Run Stateless: the Meta store should be remote (MySQl or Glue) Use combination of spot instances to reduce cost (design for re-runnability). Don’t specify the Availability Zone to get cheapest instances Single spot node termination will not interrupt the cluster (new feature: graceful decommissioning ) Build Instance fleets with mix of different instance types (c5/r5..) and different markets (spot/on-demand)
  • 13. Choosing the Right DB Days of one-size-fit-all DBs are over DBs have become specialized based on their use case, different types: Eg - Better (faster & cheaper) to use time-series for storing & plotting time base data compared to RDBMS. Relational Key-Value MPP In-Memory Document Graph Time-Series Ledger Columnar Distributed Object
  • 14. AWS Quantum Ledger DB (QLDB) Keeps track of transparent, immutable, and cryptographically verifiable transaction log data over distributed ledgers Every entry is written into a journal and cannot be changed. Journal is append only and maintains two states: Current & History Each txn generates a digest using a cryptographic hash function (SHA-256) which guarantees the integrity Serverless , SQL support & ACID compliant
  • 15. Machine Learning Services 1. AWS SageMaker Full managed service to build/train/deploy the machine learning models. Out of the box optimization for following ML packages: Supports: Supervised, Unsupervised & Reinforced learning  Supported by EC P3dn.24xlarge (8 Tesla V100 GPUs) TensorFlow, Apache MXNet, PyTorch, Chainer, Scikit-learn, SparkML, Horovod, Keras, and Gluon
  • 16. Machine Learning Services (contd..) Framework & model agnostic (use from pre- trained model library or bring your own) Integrated with Jupyter notebook On demand and scalable training clusters Integrated with different AWS services like Lambda, API Gateway, Cloudwatch etc
  • 17. Machine Learning Services (contd..) 2. AWS SageMaker Ground Truth Most of ML work is spent in labeling the training data This package can help significantly reduce the time and effort required to create datasets for training & reduce costs 3. AWS SageMaker Neo Container to deploy the ML on any hardware or application
  • 18. AWS Data Services Service Description Data Migration Service Transfers data to AWS cloud; supports homogenous migrations such as Oracle to Oracle, and heterogeneous migrations such as Oracle to Aurora AWS Macie Amazon Macie is a security service that uses machine learning to automatically discover, classify, and protect sensitive data in AWS such as PII, IP. It provides you with dashboards and alerts AWS Direct Connect It lets you establish a dedicated network connection between your network and one of the AWS facility. It saves on bandwidth cost and provides consistent network performance AWS Snowball An 80 TB physical device shipped to client location, where the data is copied and is shipped back to AWS facility for data copy into AWS. Good for petabyte scale data copy to save money on transfer cost
  • 19. Netflix Push Messaging Case Study Netflix had Polling infrastructure to poll all it’s client interfaces Polling is inherently inefficient, they were able to reduce the web traffic by 12% by switching to Push They have Open Sourced the complete Push Messaging Framework: Zuul Available on GitHub: https://github.com/Netflix/zuul/wiki/Push- Messaging
  • 20. Netflix Case Study Interesting Challenges Zuul uses persistent connection to make stateful, this makes deployments difficult Solved by using Cluster swap, however this created problem of ‘Thundering herd’ (everyone trying to connect at same time) Resolved by introducing connection lifetime (~30 min) and randomizing connection lifetime Zuul push cluster can auto-scale based on number of open connection AWS allows auto-scaling based on any metric defined in Cloud Watch
  • 21. AWS Forecast (in beta) Predicts future points in a time series given historical data Uses deep learning models developed by Amazon Accuracy is cornerstone of any forecasting; this service is 50% more accurate than traditional methods It comes with 8 pre-packaged models: 5 custom built algorithm and 3 traditional for benchmarking Inputs: Historical Time-Series (eg- electricity consumption per year), Any Related data (weather), Metadata (location) Output: Ability to visualize the forecast and export via an API All forecasting is probabilistic for a specific prediction interval including margin or error
  • 22. AWS Forecast Models Traditional Amazon Pricing Exponential Smooting ARIMA Prophet Auto Regressive LSTM Spline Quantile Forecaster Multi Horizon Quantile (MQ- RNN) Cost Type Pricing Generated forecasts $0.60 per 1,000 forecasts Data storage $0.088 per GB Training hours $0.238 per hour
  • 23. New Terms Learnt Term Meaning Dark Data Data that is hidden in files or otherwise not accessible to the enterprise Data Ponds Data that live in Silos across the Enterprise Blast Radius Impact of a deployment, eg – microservice deployment will have a smaller blast radius compared to a monolithic API Thundering Herd When everyone tries to connect at same time, eg – if your service goes down and after it is restored all users try to connect simultaneously overwhelming the system Data Decay Value of data decreases over time, data is most valuable near its creation
  • 24.
  • 25.
  • 26.
  • 27. Accelerate Innovation & Maximize Business Value w/ Serverless Apps
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33. More Info AWS Slide Deck AWS Videos AWS re:Invent Recap In 2019: Dec 2 – Dec 6 @ Las Vegas

Editor's Notes

  1. CTAS is huge, Work groups are essentially resource queue
  2. CTAS is huge
  3. CTAS is huge
  4. Predictive Scaling needs up to two weeks of historical data 
  5. The key usage is bit
  6. THE HIGHEST PERFORMING GPU INSTANCE in the cloud Reinforced: model learns by interacting real world scenarios
  7. THE HIGHEST PERFORMING GPU INSTANCE in the cloud Reinforced: model learns by interacting real world scenarios
  8. For example, building a computer vision system that is reliable enough to identify objects - such as traffic lights, stop signs, and pedestrians - requires thousands of hours of video recordings that consist of hundreds of millions of video frames. Each one of these frames needs all of the important elements like the road, other cars, and signage to be labeled by a human before any work can begin on the model you want to develop. Amazon SageMaker Ground Truth significantly reduces the time and effort required to create datasets for training to reduce costs. These savings are achieved by using machine learning to automatically label data. The model is able to get progressively better over time by continuously learning from labels created by human labelers.
  9. They also have best fit model, where AWS will choose based on the data