7. • Key Management Service (KMS)
• CloudHSM, On-premises HSM devices
• S3 - Server Side Encryption
• S3 - Client Side Encryption
• Redshift/RDS - KMS integration, HSM integration
• DynamoDB - KMS Server Side Encryption
• EMR File System
Data Security at Rest
8. • SSL/TLS
• SSH, SCP
• HTTPS
• AWS SDK, AWS Console
• Policy Enforcement
• S3 Bucket Policy
• EMR master-slave data encryption
Data Security in Transit
12. An organization wants to perform encryption on data stored on Amazon RDS then from the following
option which describes the encryption in RDS?
• A. Encryption can be enabled on RDS instances to encrypt the underlying storage, and this will by
default also encrypt snapshots as they are created. No additional configuration needs to be made
on the client side for this to work.
• B. Encryption cannot be enabled on RDS instances unless the keys are not managed by KMS.
• C. Encryption can be enabled on RDS instances to encrypt the underlying storage, but you cannot
encrypt snapshots as they are created.
• D. Encryption can be enabled on RDS instances to encrypt the underlying storage, and this will by
default also encrypt snapshots as they are created. However, some additional configuration needs
to be made on the client side for this to work.
Sample Questions
13. • Amazon S3 encrypts your data at the object level as it writes it to disk in its data centers and
decrypts it when you access it. There are a few different options depending on how you choose to
manage the keys for encryption. One of these options is called SSE-S3 (Server Side Encryption with
S3 Keys); which of the following methods describes the working of SSE-S3?
• A. You manage the encryption keys and Amazon S3 manages the encryption, as it writes to disk,
and decrypts when you access the objects.
• B. Each object is encrypted with a unique key employing strong encryption. As an additional
safeguard, it encrypts the key itself with a master key that it regularly rotates
• C. There are separate permissions of an envelope key, that provides extra protection against
unauthorized access to your objects in S3
• D. A randomly generated encryption key is returned from Amazon S3 that the client can use to
encrypt the object data.
Sample Questions
21. • An administrator has a 500-GB file in Amazon S3. The administrator runs a
nightly COPY command into a 10-node Amazon Redshift cluster. The
administrator wants to prepare the data to optimize performance of the COPY
command. How should the administrator prepare the data?
• A. Compress the file using gz compression.
• B. Split the file into 500 smaller files.
• C. Convert the file format to AVRO.
• D. Split the file into 10 files of equal size
Sample Questions
22. • You plan to use EMR to process a large amount of data that will eventually be
stored in S3. The data is currently on-premise, and will be migrated to AWS
using the Snowball service. The file sizes range from 300 MB to 500 MB. Over
the next 6 months, your company will migrate over 2 PB of data to S3 and costs
are a concern. Which compression algorithm provides you with the highest
compression ratio, allowing you to both maximize performance minimize costs?
• A. bzip2
• B. Gzip
• C. Lzo
• D. Snappy
Sample Questions
26. • Your organization is storing millions of sensitive transactions across thousands of 100 GB files that
must be encrypted in transit and at rest. Analysts concurrently depend on subsets of files to
generate simulations that can be used to steer business decisions, which consumes up to 5 TB of
storage. You are the solutions architect, hence, you are required to build a solution that can
accommodate the long-term storage and in-flight of data in a cost effective way. How would you
do that?
• A. Store the full data set on encrypted EBS volumes, and regularly capture snapshots. Attach to EC2
and run simulation on EC2
• B. Use S3 with server side encryption, and run simulations on EMR
• C. Use HDFS on Amazon EMR, and run simulations on EMR
• D. Use Glacier with server side encryption, and run simulations on EC2
Sample Questions
30. • The company is a Uber-liked start up, focus on New York City local transportation.
They want to build a real-time dashboard based on NYC taxi data, so they could
have some level of understand the demand. They want to understand the
traffic/demand by geographic.
Demo: Problem
31. • Approximately 800 transactions per second.
• Real time
• Visualize by geographic
Demo
37. • Implement core AWS Big Data services according to basic architecture best
practices
• Design and maintain Big Data
• Leverage tools to automate data analysis
AWS Certified Big Data Specialty
38. • Recommended AWS Knowledge
• A minimum of 2 years’ experience using AWS technology
• AWS Security best practices
• Independently define AWS architecture and services and understand how
they integrate with each other.
• Define and architect AWS big data services and explain how they fit in the
data lifecycle of collection, ingestion, storage, processing, and visualization.
Knowledge requirement
39. • Recommended General IT Knowledge
• At least 5 years’ experience in a data analytics field
• Understand how to control access to secure data
• Understand the frameworks that underpin large scale distributed systems
like Hadoop/Spark and MPP data warehouses
• Understand the tools and design platforms that allow processing of data
from multiple heterogeneous sources with difference frequencies
(batch/real-time)
• Capable of designing a scalable and cost-effective architecture to process
data
Suggested experience