대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016

대용량 데이타 쉽고 빠르게 분석하기

Amazon RedshiftAmazon Elastic MapReduce
Amazon Glacier
Amazon
DynamoDB
Amazon Machine
Learning
Amazon Kinesis
Combinational Services for Data analytics
Data WarehouseSemi-structured NoSQL Predictive Models Other AppsStreaming
Amazon Simple
Storage Service
Data Lake Archive
Log Generator

Create EC2 instance to generate logs
• AMI -> Public Images -> AMI Name : da-hands-on
• Select the AMI and Click Launch
• Instance Type: t2.medium
• Tag: Name - myname-dev
• Security group with SSH ingress opened
$ aws ec2 create-security-group --group-name andy-ssh-sg --description "open SSH
only" --vpc-id vpc-33d27056
{
"GroupId": "sg-7f3dd918"
}

$ aws ec2 authorize-security-group-ingress --group-id sg-7f3dd918 --protocol tcp --
port 22 --cidr 0.0.0.0/0
$ aws ec2 run-instances --image-id ami-5c2beb3d --count 1 --instance-type
t2.medium --key-name ilho_tokyo --security-group-ids sg-7f3dd918 --subnet-id
subnet-1a7bad43 --associate-public-ip-address
{
"OwnerId": "806506827877",
"ReservationId": "r-a58c5e2a",
"Groups": [],
"Instances": [
{
"Monitoring": {
…..................

Create S3 bucket
• Bucket Name: myname-game-log
• Region: Tokyo
$ aws s3 mb s3://andy-game-log --region ap-northeast-1
make_bucket: andy-game-log

Amazon Glacier
Amazon
DynamoDB
Amazon Machine
Learning
Amazon Kinesis
Generating Logs to stream them to Kinesis
Amazon Simple
Storage Service
Data Lake Archive
Log Generator

Create Kinesis Steam
• Stream Name: myname-game-stream
• Number of Shards: 1
$ aws kinesis create-stream --stream-name andy-game-stream --shard-count 1
$ aws kinesis list-streams

Launch Redshift
• Cluster Identifier: myname-game-dw
• Database Name: mynamegame
• Database Port: 5439 (default)
• Node Type: dc1.large
• Cluster Type: Single Node
• Number of Compute Nodes: 1 (required for multi-node)

$ aws redshift create-cluster --cluster-identifier andy-game-dw --db-name mydb --node-type
dc1.large --cluster-type single-node --publicly-accessible --master-username admin --master-
user-password GamingonAWS2016
{
"Cluster": {
"IamRoles": [],
"ClusterVersion": "1.0",
"NodeType": "dc1.large",
"PubliclyAccessible": true,
"Tags": [],
"MasterUsername": "admin",
"ClusterParameterGroups": [
{
"ParameterGroupName": "default.redshift-1.0",
"ParameterApplyStatus": "in-sync"
}
],
"Encrypted": false,
…....................

Let’s connect to EC2 instances
$ ssh -i [$mykey].pem ec2-user@xx.xx.xx.xx

Prepared python demo scripts
[ec2-user@ip-10-10-0-13 data_analytics_demo]$ ls -1
amazon_kclpy
amazon_kclpy_helper.py
config.json
config.py
config.pyc
consumer.properties
consumer.py
demo_util.py
demo_util.pyc
inserter.py
kcl
kinesis_helper.py
kinesis_helper.pyc
LICENSE
logs
reader.py
run_consumer.sh
simulator.py
summarizer.py

Generating Logs to Kinesis stream
$ Python Simulator.py
https://github.com/awslabs/kinesis-poster-worker

Consuming Logs from Kinesis stream
$ python amazon_kclpy_helper.py --print_command --java $(which java) --properties
./consumer.properties
https://github.com/awslabs/amazon-kinesis-client-python

Checking Logs files in S3 bucket
$ aws s3 ls myname-game-log

Amazon Glacier
Amazon
DynamoDB
Amazon Machine
Learning
Amazon Kinesis
What we’ve done so far.
Amazon Simple
Storage Service
Data Lake Archive
Log Generator

Copy Log data from S3 to Redshift
$ python inserter.py

Checking log tables in Redshift
$ psql -h hostname -p 5439 -U username -d dbname
Dbname=# select * from log limit 10;
….............

Amazon Glacier
Amazon
DynamoDB
Amazon Machine
Learning
Amazon Kinesis
Creating a new table in Redshift
Amazon Simple
Storage Service
Data Lake Archive
Log Generator
Creating summary tables from log table

Creating a summary table from log table
$ python summarizer.py

Run Business Intelligence Tools
Amazon RedshiftAmazon Elastic
MapReduce
Amazon
Glacier
Amazon
DynamoDB
Amazon
Machine
Learning
Amazon Kinesis
Amazon
Simple
Storage
Service
Data Lake Archive
Log
Generator

Loading Streaming Data into Amazon Elasticsearch Service
MapReduce
Amazon
Glacier
Amazon
DynamoDB
Amazon
Machine
Learning
Amazon Kinesis
Amazon
Simple
Storage
Service
Data Lake Archive
Log
Generator
Amazon Elasticsearch
Service

Launch Elasticsearch
• Go to AWS management console
• Launch Elasticsearch domain
• Set access policy to public open for Demo only

Loading Streaming Data into Amazon Elasticsearch Service
MapReduce
Amazon
Glacier
Amazon
DynamoDB
Amazon
Machine
Learning
Amazon Kinesis
Amazon
Simple
Storage
Service
Data Lake Archive
Log
Generator
Service
AWS
Lambda

Creating and configuring Lambda function
• https://github.com/awslabs/amazon-elasticsearch-lambda-samples
• Download a sample JS file
• Install required Nodejs packages
• Modify ElasticSearch endpoint
• Zip all files including node_modules
• Upload zip file to Lambda function
• Set lambda role to access Elasticsearch

Checking result at Kibana
• Query
• Result

Querying Amazon Kinesis Streams
Directly with SQL and Spark Streaming?

EMR and Spark Streaming
MapReduce
Amazon
Glacier
Amazon
DynamoDB
Amazon
Machine
Learning
Amazon Kinesis
Amazon
Simple
Storage
Service
Data Lake Archive
Log
Generator
Service
AWS
Lambda

Creating EMR Cluster with Spark is a simple job
$ aws emr create-cluster --release-label emr-4.2.0 --applications Name=Spark
Name=Hive --ec2-attributes KeyName=myKey --use-default-roles --instance-groups
InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge
InstanceGroupType=CORE,InstanceCount=2,InstanceType=m3.xlarge --bootstrap-
actions Path=s3://aws-bigdata-
blog/artifacts/Querying_Amazon_Kinesis/DownloadKCLtoEMR400.sh,Name=InstallKC
LLibs

Managing resources is easy. But building
logics is complicating.

• A fully managed service for continuously querying streaming data using standard
SQL
• Use cases: Preprocessing streams / Most frequently occurring values Counting
distinct values / Simple alerts / Detecting anomalies on a stream / Post processing
in application stream
Real-time Log Analytics
Amazon Kinesis Analytics

Amazon Kinesis Analytics
MapReduce
Amazon
Glacier
Amazon
DynamoDB
Amazon
Machine
Learning
Amazon Kinesis
Amazon
Simple
Storage
Service
Data Lake Archive
Log
Generator
Amazon Elasticsearc
Service
AWS
Lambda
Amazon
Kinesis
Analytics

Adding Amazon Machine Learning

Adding Amazon Machine Learning is your
homework. 
Amazon Machine Learning 게임에서 활용해보기
:: 김일호 :: AWS Summit Seoul 2016
https://www.youtube.com/watch?v=Bs1QZMlwmLM&feature=youtu.be

A hint 
MapReduce
Amazon
Glacier
Amazon
DynamoDB
Amazon
Machine
Learning
Amazon Kinesis
Amazon
Simple
Storage
Service
Data Lake Archive
Log
Generator
Amazon Elasticsearc
Service
AWS
Lambda
Amazon
Kinesis
Analytics

대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016

More Related Content

Viewers also liked

Similar to 대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016

More from Amazon Web Services Korea

Recently uploaded

대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016