SlideShare a Scribd company logo
1 of 42
Big Data
AWS Big Data Well-Architected
Instructor: Jolay Zhang
2003
2007
2012 2013 2014 2017
2015
2016 2018
2019
Introduction
Jolay Zhang
2010
Big Data Well-Architected
Basically Available, Soft state, Eventual consistency vs Atomicity Consistency
Isolation Durability
BASE vs ACID
• Data Security
• Scalability
• Performance Efficiency
• Cost Optimization
• Operational Excellent
• Reliability
• Disaster Recovery
• Migration and Hybrid System
Big Data Well-Architected
Data Security Pillar
• Key Management Service (KMS)
• CloudHSM, On-premises HSM devices
• S3 - Server Side Encryption
• S3 - Client Side Encryption
• Redshift/RDS - KMS integration, HSM integration
• DynamoDB - KMS Server Side Encryption
• EMR File System
Data Security at Rest
• SSL/TLS
• SSH, SCP
• HTTPS
• AWS SDK, AWS Console
• Policy Enforcement
• S3 Bucket Policy
• EMR master-slave data encryption
Data Security in Transit
• VPC (Virtual Private Cloud)
• Subnet
• Route Table
• Internet Gateway
• Security Group
• ACL
• Egress-Only Internet Gateway (IP V6 only)
• VPN (Virtual Private Network)
• NAT Gateways (Network Address Translation)
• Endpoint Services
• Transit Gateway
Network Isolation
• IAM Policy
• S3 Bucket Policy
• DynamoDB Policy
• Glue Data Catalog Policy
Fine-Grained Permission Control
• EMR Kerberos Integration
EMR Authentication & Authorization
An organization wants to perform encryption on data stored on Amazon RDS then from the following
option which describes the encryption in RDS?
• A. Encryption can be enabled on RDS instances to encrypt the underlying storage, and this will by
default also encrypt snapshots as they are created. No additional configuration needs to be made
on the client side for this to work.
• B. Encryption cannot be enabled on RDS instances unless the keys are not managed by KMS.
• C. Encryption can be enabled on RDS instances to encrypt the underlying storage, but you cannot
encrypt snapshots as they are created.
• D. Encryption can be enabled on RDS instances to encrypt the underlying storage, and this will by
default also encrypt snapshots as they are created. However, some additional configuration needs
to be made on the client side for this to work.
Sample Questions
• Amazon S3 encrypts your data at the object level as it writes it to disk in its data centers and
decrypts it when you access it. There are a few different options depending on how you choose to
manage the keys for encryption. One of these options is called SSE-S3 (Server Side Encryption with
S3 Keys); which of the following methods describes the working of SSE-S3?
• A. You manage the encryption keys and Amazon S3 manages the encryption, as it writes to disk,
and decrypts when you access the objects.
• B. Each object is encrypted with a unique key employing strong encryption. As an additional
safeguard, it encrypts the key itself with a master key that it regularly rotates
• C. There are separate permissions of an envelope key, that provides extra protection against
unauthorized access to your objects in S3
• D. A randomly generated encryption key is returned from Amazon S3 that the client can use to
encrypt the object data.
Sample Questions
Scalability Pillar
• Kinesis Streams - Sharding
• DynamoDB - Provisioned Throughput
• Redshift - Provision EC2 instances
• EMR - Provision EC2 instances
• ElasticSearch/CloudSearch - Provision EC2 instances
• Glue ETL - DPU
Scalability
DynamoDB - Consistent Hash
Performance Efficiency Pillar
• S3 File Format, Encryption, Partition, Compression
• DynamoDB Hash Key, Range Key, Secondary Index
• EMR/EC2 instance type
• Redshift Distribution Styles, Sort Keys, Compression…
• Athena/Glue Partitioning
Performance Efficiency
File Format
Compression
Algorithm Splittable Compression Ratio Compress/Decompress
speed
Gzip No High Medium
Bzip2 Yes Very High Slow
LZO Yes Low Fast
Snappy No Low Very fast
• An administrator has a 500-GB file in Amazon S3. The administrator runs a
nightly COPY command into a 10-node Amazon Redshift cluster. The
administrator wants to prepare the data to optimize performance of the COPY
command. How should the administrator prepare the data?
• A. Compress the file using gz compression.
• B. Split the file into 500 smaller files.
• C. Convert the file format to AVRO.
• D. Split the file into 10 files of equal size
Sample Questions
• You plan to use EMR to process a large amount of data that will eventually be
stored in S3. The data is currently on-premise, and will be migrated to AWS
using the Snowball service. The file sizes range from 300 MB to 500 MB. Over
the next 6 months, your company will migrate over 2 PB of data to S3 and costs
are a concern. Which compression algorithm provides you with the highest
compression ratio, allowing you to both maximize performance minimize costs?
• A. bzip2
• B. Gzip
• C. Lzo
• D. Snappy
Sample Questions
Cost Optimization Pillar
• Different cost models
• Charge by API calls
• Charge by Instance running hours
• Charge by IO
• Spot Instance, Reserved Instance
• Free Tier
• Data lifecycle
• S3 - Storage Class
Cost Optimization
S3 Storage Class
• Your organization is storing millions of sensitive transactions across thousands of 100 GB files that
must be encrypted in transit and at rest. Analysts concurrently depend on subsets of files to
generate simulations that can be used to steer business decisions, which consumes up to 5 TB of
storage. You are the solutions architect, hence, you are required to build a solution that can
accommodate the long-term storage and in-flight of data in a cost effective way. How would you
do that?
• A. Store the full data set on encrypted EBS volumes, and regularly capture snapshots. Attach to EC2
and run simulation on EC2
• B. Use S3 with server side encryption, and run simulations on EMR
• C. Use HDFS on Amazon EMR, and run simulations on EMR
• D. Use Glacier with server side encryption, and run simulations on EC2
Sample Questions
Operational Excellent Pillar
• Auto Scaling
• EMR
• DynamoDB
• CloudFormation - Infrastructure As Code
• High Availability
• DynamoDB
• EMR multi-master support
Operational Excellent
Solve a real problem
• The company is a Uber-liked start up, focus on New York City local transportation.
They want to build a real-time dashboard based on NYC taxi data, so they could
have some level of understand the demand. They want to understand the
traffic/demand by geographic.
Demo: Problem
• Approximately 800 transactions per second.
• Real time
• Visualize by geographic
Demo
Demo: Architect
Demo: Is it the best solution?
EMR Kinesis Stream Elasticsearch
Security
Fault Tolerance
Scalability
Cost
About AWS Certification
AWS Certifications
AWS Certified Big Data Specialty
• Implement core AWS Big Data services according to basic architecture best
practices
• Design and maintain Big Data
• Leverage tools to automate data analysis
AWS Certified Big Data Specialty
• Recommended AWS Knowledge
• A minimum of 2 years’ experience using AWS technology
• AWS Security best practices
• Independently define AWS architecture and services and understand how
they integrate with each other.
• Define and architect AWS big data services and explain how they fit in the
data lifecycle of collection, ingestion, storage, processing, and visualization.
Knowledge requirement
• Recommended General IT Knowledge
• At least 5 years’ experience in a data analytics field
• Understand how to control access to secure data
• Understand the frameworks that underpin large scale distributed systems
like Hadoop/Spark and MPP data warehouses
• Understand the tools and design platforms that allow processing of data
from multiple heterogeneous sources with difference frequencies
(batch/real-time)
• Capable of designing a scalable and cost-effective architecture to process
data
Suggested experience
Follow a Learning Path
Q & A
TYPE OF DATA JOB SEEKERS

More Related Content

What's hot

Northwestern Mutual Journey – Transform BI Space to Cloud
Northwestern Mutual Journey – Transform BI Space to CloudNorthwestern Mutual Journey – Transform BI Space to Cloud
Northwestern Mutual Journey – Transform BI Space to Cloud
Databricks
 
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
Spark Summit
 
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Databricks
 

What's hot (20)

Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft Azure
 
Spark as a Service with Azure Databricks
Spark as a Service with Azure DatabricksSpark as a Service with Azure Databricks
Spark as a Service with Azure Databricks
 
Spark and Online Analytics: Spark Summit East talky by Shubham Chopra
Spark and Online Analytics: Spark Summit East talky by Shubham ChopraSpark and Online Analytics: Spark Summit East talky by Shubham Chopra
Spark and Online Analytics: Spark Summit East talky by Shubham Chopra
 
Scale and Optimize Data Engineering Pipelines with Software Engineering Best ...
Scale and Optimize Data Engineering Pipelines with Software Engineering Best ...Scale and Optimize Data Engineering Pipelines with Software Engineering Best ...
Scale and Optimize Data Engineering Pipelines with Software Engineering Best ...
 
201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning
 
Databricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With DataDatabricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With Data
 
Data engineering
Data engineeringData engineering
Data engineering
 
Scalable Machine Learning using R and Azure HDInsight - Parashar
Scalable Machine Learning using R and Azure HDInsight - ParasharScalable Machine Learning using R and Azure HDInsight - Parashar
Scalable Machine Learning using R and Azure HDInsight - Parashar
 
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
 
Global AI Bootcamp Madrid - Azure Databricks
Global AI Bootcamp Madrid - Azure DatabricksGlobal AI Bootcamp Madrid - Azure Databricks
Global AI Bootcamp Madrid - Azure Databricks
 
Azure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha DittmannAzure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha Dittmann
 
Building a Cross Cloud Data Protection Engine
Building a Cross Cloud Data Protection EngineBuilding a Cross Cloud Data Protection Engine
Building a Cross Cloud Data Protection Engine
 
Northwestern Mutual Journey – Transform BI Space to Cloud
Northwestern Mutual Journey – Transform BI Space to CloudNorthwestern Mutual Journey – Transform BI Space to Cloud
Northwestern Mutual Journey – Transform BI Space to Cloud
 
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
 
Azure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandyAzure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandy
 
Delivering business insights and automation utilizing aws data services
Delivering business insights and automation utilizing aws data servicesDelivering business insights and automation utilizing aws data services
Delivering business insights and automation utilizing aws data services
 
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
 
Azure Databricks - An Introduction (by Kris Bock)
Azure Databricks - An Introduction (by Kris Bock)Azure Databricks - An Introduction (by Kris Bock)
Azure Databricks - An Introduction (by Kris Bock)
 
Building Intelligent Applications w/ Cassandra, Spark & DataStax by Jeff Carp...
Building Intelligent Applications w/ Cassandra, Spark & DataStax by Jeff Carp...Building Intelligent Applications w/ Cassandra, Spark & DataStax by Jeff Carp...
Building Intelligent Applications w/ Cassandra, Spark & DataStax by Jeff Carp...
 
Big Data with Azure
Big Data with AzureBig Data with Azure
Big Data with Azure
 

Similar to AWS Well Architected-Info Session WeCloudData

Similar to AWS Well Architected-Info Session WeCloudData (20)

Create cloud service on AWS
Create cloud service on AWSCreate cloud service on AWS
Create cloud service on AWS
 
STG330_Case Study How Experian Leverages Amazon EC2, EBS, and S3 with Clouder...
STG330_Case Study How Experian Leverages Amazon EC2, EBS, and S3 with Clouder...STG330_Case Study How Experian Leverages Amazon EC2, EBS, and S3 with Clouder...
STG330_Case Study How Experian Leverages Amazon EC2, EBS, and S3 with Clouder...
 
Deep Dive on Amazon Relational Database Service
Deep Dive on Amazon Relational Database ServiceDeep Dive on Amazon Relational Database Service
Deep Dive on Amazon Relational Database Service
 
AWS Summit Auckland - Building a Server-less Data Lake on AWS
AWS Summit Auckland - Building a Server-less Data Lake on AWSAWS Summit Auckland - Building a Server-less Data Lake on AWS
AWS Summit Auckland - Building a Server-less Data Lake on AWS
 
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
 
Running SQL Server on AWS | John McCormack | DataGrillen 2019
Running SQL Server on AWS | John McCormack | DataGrillen 2019Running SQL Server on AWS | John McCormack | DataGrillen 2019
Running SQL Server on AWS | John McCormack | DataGrillen 2019
 
Being Well Architected in the Cloud (Updated)
Being Well Architected in the Cloud (Updated)Being Well Architected in the Cloud (Updated)
Being Well Architected in the Cloud (Updated)
 
Being Well-Architected in the Cloud
Being Well-Architected in the CloudBeing Well-Architected in the Cloud
Being Well-Architected in the Cloud
 
Scaling the Platform for Your Startup
Scaling the Platform for Your StartupScaling the Platform for Your Startup
Scaling the Platform for Your Startup
 
Building a Server-less Data Lake on AWS - Technical 301
Building a Server-less Data Lake on AWS - Technical 301Building a Server-less Data Lake on AWS - Technical 301
Building a Server-less Data Lake on AWS - Technical 301
 
Scaling the Platform for Your Startup - Startup Talks June 2015
Scaling the Platform for Your Startup - Startup Talks June 2015Scaling the Platform for Your Startup - Startup Talks June 2015
Scaling the Platform for Your Startup - Startup Talks June 2015
 
Deep Dive RDS & Aurora - Pop-up Loft TLV 2017
Deep Dive RDS & Aurora - Pop-up Loft TLV 2017Deep Dive RDS & Aurora - Pop-up Loft TLV 2017
Deep Dive RDS & Aurora - Pop-up Loft TLV 2017
 
Deep Dive: Amazon Relational Database Service (March 2017)
Deep Dive: Amazon Relational Database Service (March 2017)Deep Dive: Amazon Relational Database Service (March 2017)
Deep Dive: Amazon Relational Database Service (March 2017)
 
Deep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduceDeep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduce
 
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
 
Amazon Relational Database Service (Amazon RDS)
Amazon Relational Database Service (Amazon RDS)Amazon Relational Database Service (Amazon RDS)
Amazon Relational Database Service (Amazon RDS)
 
Deep Dive on Amazon Relational Database Service
Deep Dive on Amazon Relational Database ServiceDeep Dive on Amazon Relational Database Service
Deep Dive on Amazon Relational Database Service
 
Migrating Your Databases to AWS Deep Dive on Amazon RDS and AWS
Migrating Your Databases to AWS Deep Dive on Amazon RDS and AWSMigrating Your Databases to AWS Deep Dive on Amazon RDS and AWS
Migrating Your Databases to AWS Deep Dive on Amazon RDS and AWS
 
Amazon Relational Database Service Deep Dive
Amazon Relational Database Service Deep DiveAmazon Relational Database Service Deep Dive
Amazon Relational Database Service Deep Dive
 
Deep Dive on Amazon Relational Database Service (November 2016)
Deep Dive on Amazon Relational Database Service (November 2016)Deep Dive on Amazon Relational Database Service (November 2016)
Deep Dive on Amazon Relational Database Service (November 2016)
 

More from WeCloudData

More from WeCloudData (14)

Machine learning in Healthcare - WeCloudData
Machine learning in Healthcare - WeCloudDataMachine learning in Healthcare - WeCloudData
Machine learning in Healthcare - WeCloudData
 
Deep Learning Introduction - WeCloudData
Deep Learning Introduction - WeCloudDataDeep Learning Introduction - WeCloudData
Deep Learning Introduction - WeCloudData
 
Big Data for Data Scientists - WeCloudData
Big Data for Data Scientists - WeCloudDataBig Data for Data Scientists - WeCloudData
Big Data for Data Scientists - WeCloudData
 
Introduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudDataIntroduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudData
 
Data Science with Python - WeCloudData
Data Science with Python - WeCloudDataData Science with Python - WeCloudData
Data Science with Python - WeCloudData
 
SQL for Data Science
SQL for Data ScienceSQL for Data Science
SQL for Data Science
 
Introduction to Python by WeCloudData
Introduction to Python by WeCloudDataIntroduction to Python by WeCloudData
Introduction to Python by WeCloudData
 
Data Science Career Insights by WeCloudData
Data Science Career Insights by WeCloudDataData Science Career Insights by WeCloudData
Data Science Career Insights by WeCloudData
 
Web scraping project aritza-compressed
Web scraping project   aritza-compressedWeb scraping project   aritza-compressed
Web scraping project aritza-compressed
 
Applied Machine Learning Course - Jodie Zhu (WeCloudData)
Applied Machine Learning Course - Jodie Zhu (WeCloudData)Applied Machine Learning Course - Jodie Zhu (WeCloudData)
Applied Machine Learning Course - Jodie Zhu (WeCloudData)
 
Introduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudDataIntroduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudData
 
Big Data for Data Scientists - Info Session
Big Data for Data Scientists - Info SessionBig Data for Data Scientists - Info Session
Big Data for Data Scientists - Info Session
 
WeCloudData Toronto Open311 Workshop - Matthew Reyes
WeCloudData Toronto Open311 Workshop - Matthew ReyesWeCloudData Toronto Open311 Workshop - Matthew Reyes
WeCloudData Toronto Open311 Workshop - Matthew Reyes
 
Tordatasci meetup-precima-retail-analytics-201901
Tordatasci meetup-precima-retail-analytics-201901Tordatasci meetup-precima-retail-analytics-201901
Tordatasci meetup-precima-retail-analytics-201901
 

Recently uploaded

Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
cnajjemba
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
ptikerjasaptiker
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 

Recently uploaded (20)

Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 

AWS Well Architected-Info Session WeCloudData

  • 1. Big Data AWS Big Data Well-Architected Instructor: Jolay Zhang
  • 2. 2003 2007 2012 2013 2014 2017 2015 2016 2018 2019 Introduction Jolay Zhang 2010
  • 4. Basically Available, Soft state, Eventual consistency vs Atomicity Consistency Isolation Durability BASE vs ACID
  • 5. • Data Security • Scalability • Performance Efficiency • Cost Optimization • Operational Excellent • Reliability • Disaster Recovery • Migration and Hybrid System Big Data Well-Architected
  • 7. • Key Management Service (KMS) • CloudHSM, On-premises HSM devices • S3 - Server Side Encryption • S3 - Client Side Encryption • Redshift/RDS - KMS integration, HSM integration • DynamoDB - KMS Server Side Encryption • EMR File System Data Security at Rest
  • 8. • SSL/TLS • SSH, SCP • HTTPS • AWS SDK, AWS Console • Policy Enforcement • S3 Bucket Policy • EMR master-slave data encryption Data Security in Transit
  • 9. • VPC (Virtual Private Cloud) • Subnet • Route Table • Internet Gateway • Security Group • ACL • Egress-Only Internet Gateway (IP V6 only) • VPN (Virtual Private Network) • NAT Gateways (Network Address Translation) • Endpoint Services • Transit Gateway Network Isolation
  • 10. • IAM Policy • S3 Bucket Policy • DynamoDB Policy • Glue Data Catalog Policy Fine-Grained Permission Control
  • 11. • EMR Kerberos Integration EMR Authentication & Authorization
  • 12. An organization wants to perform encryption on data stored on Amazon RDS then from the following option which describes the encryption in RDS? • A. Encryption can be enabled on RDS instances to encrypt the underlying storage, and this will by default also encrypt snapshots as they are created. No additional configuration needs to be made on the client side for this to work. • B. Encryption cannot be enabled on RDS instances unless the keys are not managed by KMS. • C. Encryption can be enabled on RDS instances to encrypt the underlying storage, but you cannot encrypt snapshots as they are created. • D. Encryption can be enabled on RDS instances to encrypt the underlying storage, and this will by default also encrypt snapshots as they are created. However, some additional configuration needs to be made on the client side for this to work. Sample Questions
  • 13. • Amazon S3 encrypts your data at the object level as it writes it to disk in its data centers and decrypts it when you access it. There are a few different options depending on how you choose to manage the keys for encryption. One of these options is called SSE-S3 (Server Side Encryption with S3 Keys); which of the following methods describes the working of SSE-S3? • A. You manage the encryption keys and Amazon S3 manages the encryption, as it writes to disk, and decrypts when you access the objects. • B. Each object is encrypted with a unique key employing strong encryption. As an additional safeguard, it encrypts the key itself with a master key that it regularly rotates • C. There are separate permissions of an envelope key, that provides extra protection against unauthorized access to your objects in S3 • D. A randomly generated encryption key is returned from Amazon S3 that the client can use to encrypt the object data. Sample Questions
  • 15. • Kinesis Streams - Sharding • DynamoDB - Provisioned Throughput • Redshift - Provision EC2 instances • EMR - Provision EC2 instances • ElasticSearch/CloudSearch - Provision EC2 instances • Glue ETL - DPU Scalability
  • 18. • S3 File Format, Encryption, Partition, Compression • DynamoDB Hash Key, Range Key, Secondary Index • EMR/EC2 instance type • Redshift Distribution Styles, Sort Keys, Compression… • Athena/Glue Partitioning Performance Efficiency
  • 20. Compression Algorithm Splittable Compression Ratio Compress/Decompress speed Gzip No High Medium Bzip2 Yes Very High Slow LZO Yes Low Fast Snappy No Low Very fast
  • 21. • An administrator has a 500-GB file in Amazon S3. The administrator runs a nightly COPY command into a 10-node Amazon Redshift cluster. The administrator wants to prepare the data to optimize performance of the COPY command. How should the administrator prepare the data? • A. Compress the file using gz compression. • B. Split the file into 500 smaller files. • C. Convert the file format to AVRO. • D. Split the file into 10 files of equal size Sample Questions
  • 22. • You plan to use EMR to process a large amount of data that will eventually be stored in S3. The data is currently on-premise, and will be migrated to AWS using the Snowball service. The file sizes range from 300 MB to 500 MB. Over the next 6 months, your company will migrate over 2 PB of data to S3 and costs are a concern. Which compression algorithm provides you with the highest compression ratio, allowing you to both maximize performance minimize costs? • A. bzip2 • B. Gzip • C. Lzo • D. Snappy Sample Questions
  • 24. • Different cost models • Charge by API calls • Charge by Instance running hours • Charge by IO • Spot Instance, Reserved Instance • Free Tier • Data lifecycle • S3 - Storage Class Cost Optimization
  • 26. • Your organization is storing millions of sensitive transactions across thousands of 100 GB files that must be encrypted in transit and at rest. Analysts concurrently depend on subsets of files to generate simulations that can be used to steer business decisions, which consumes up to 5 TB of storage. You are the solutions architect, hence, you are required to build a solution that can accommodate the long-term storage and in-flight of data in a cost effective way. How would you do that? • A. Store the full data set on encrypted EBS volumes, and regularly capture snapshots. Attach to EC2 and run simulation on EC2 • B. Use S3 with server side encryption, and run simulations on EMR • C. Use HDFS on Amazon EMR, and run simulations on EMR • D. Use Glacier with server side encryption, and run simulations on EC2 Sample Questions
  • 28. • Auto Scaling • EMR • DynamoDB • CloudFormation - Infrastructure As Code • High Availability • DynamoDB • EMR multi-master support Operational Excellent
  • 29. Solve a real problem
  • 30. • The company is a Uber-liked start up, focus on New York City local transportation. They want to build a real-time dashboard based on NYC taxi data, so they could have some level of understand the demand. They want to understand the traffic/demand by geographic. Demo: Problem
  • 31. • Approximately 800 transactions per second. • Real time • Visualize by geographic Demo
  • 33. Demo: Is it the best solution? EMR Kinesis Stream Elasticsearch Security Fault Tolerance Scalability Cost
  • 36. AWS Certified Big Data Specialty
  • 37. • Implement core AWS Big Data services according to basic architecture best practices • Design and maintain Big Data • Leverage tools to automate data analysis AWS Certified Big Data Specialty
  • 38. • Recommended AWS Knowledge • A minimum of 2 years’ experience using AWS technology • AWS Security best practices • Independently define AWS architecture and services and understand how they integrate with each other. • Define and architect AWS big data services and explain how they fit in the data lifecycle of collection, ingestion, storage, processing, and visualization. Knowledge requirement
  • 39. • Recommended General IT Knowledge • At least 5 years’ experience in a data analytics field • Understand how to control access to secure data • Understand the frameworks that underpin large scale distributed systems like Hadoop/Spark and MPP data warehouses • Understand the tools and design platforms that allow processing of data from multiple heterogeneous sources with difference frequencies (batch/real-time) • Capable of designing a scalable and cost-effective architecture to process data Suggested experience
  • 41. Q & A
  • 42. TYPE OF DATA JOB SEEKERS