AWS
Eight
Sansan Eight
AWS Dev Day Tokyo 2018
Database Track #04 Nov. 01, 2018
Sansan Eight


SRE
Amazon Web Services (2015)

JAWS-UG (10 )

2014
Your Business Network
Eight DSOCSansan
SNS
Your business network
Eight AWS
Web/API
KPI
DB NW
Web/API
Feed Recommend
Agenda
Eight


Eight = 

Dev Day Tokyo 2017 

https://d1.awsstatic.com/events/jp/2017/summit/devday/D4T8-5.pdf
💡
KPI
Eight
Aggregator
S3 (store)
Elasticsearch (view)
Cloud Service
Redshift (view/store) Redash (view)
Servers /
Applications
Kinesis Data Streams Lambda DynamoDB Lambda
SQS
ElastiCache
EC2 (Poll) DynamoDB Lambda ElastiCache
Athena (view)
Eight
Cloud Service
Redshift (view/store) Redash (view)
Servers /
Applications
Aggregator
S3 (store)
Elasticsearch (view)
Kinesis Data Streams Lambda DynamoDB Lambda
SQS
ElastiCache
EC2 (Poll) DynamoDB Lambda ElastiCache
Athena (view)
fluentd
Redshift
Redshift or Redash
Applications Cloud Service Redshift
Redash
Analyst
fluentd + Kinesis Data Firehose
S3 + Redshift
Glue + Redshift + Redash +
Aggregators
Kinesis
Data Firehose
S3
(Delivered logs)
Lambda
(Classify by name)
S3
(Classified Logs)
Glue Redshift
Redash
Analyst
Aggregators
Kinesis
Data Firehose
S3
(Delivered logs)
Lambda
(Classify by name)
S3
(Classified Logs)
Glue
Redshift
fluent-plugin-kinesis
Kinesis Data Firehose
S3
Firehose
Lambda
Firehose
S3
Redshift
S3
Kinesis
Data Firehose
Lambda
(Classify by name)
Glue
Redshift
Kinesis Data Streams Kinesis Data Firehose
S3
Lambda
1 -1Firehose DeliveryStream
Lambda Redshift COPY
OK
Glue
S3 Athena Redshift Spectrum
AWS Glue
Full Managed ETL Service
Glue Redshift
Glue
DPU 0.44 1 ETL 10
5DPU 2DPU
DPU 0.44 1 10
2DPU
1
0.44(USD/DPU hour) * 10/60(hour) * 2(DPU) = 0.147(USD)
50
50 * 0.44(USD/DPU hour) * 10/60(hour) * 2(DPU) = 7.333(USD)
30
$5,280
76 1
1 20 25
IOwait
S3 Read / Redshift Write
Python concurrent.futures.ThreadPoolExecutor
4DPU15 5 5 1
Redshift
1
Job Bookmark
choice
Glue
int/long 100 3458395800 1
string/ value JSON
long/string ID ’undefined’
Glue (DynamicFrame)
choice Redshift NULL
information_schema.columns resolveChoice
{“user_id”: 3234567890, “device_type”: “iOS”, “logged_in”:
1234567890}
{“user_id”: “123”, “device_type”: “iOS”, “logged_in”: 1234567890}
{“user_id”: 12345, “device_type”: “Android”, “logged_in”:
1234567890}
{“user_id”: {“int”: null, “long”: 3234567890}, “device_type”: “iOS”, “logged_in”:
1234567890}
{“user_id”: {“int”: “123”, “long”: null}, “device_type”: “iOS”, “logged_in”:
1234567890}
{“user_id”: {“int”: 12345, “long”: null}, “device_type”: “Android”, “logged_in”:
1234567890}
timestamp
UNIX timestamp “YYYY-mm-dd HH:MM:SS” NG
or OK
“YYYY-mm-dd HH:MM:SS” JSON string
COLUMN_NAME_string COLUMN_NAME
SELECT S3 Glue
timestamp ApplyMapping source - target
COLUMN_NAME_string
{“user_id”: 1234567, “device_type”: “iOS”, “logged_in”: ”2018-11-01 01:00:00"}
{“user_id”: “123”, “device_type”: “iOS”, “logged_in”: ”2018-11-01 03:00:00”}
{“user_id”: 12345, “device_type”: “Android”, “logged_in”: ”2018-11-01 05:20:00"}
id | bigint | not null default…
user_id | bigint | not null
device_type | integer | not null
logged_in | timestamp without time zone | not null
id | bigint | not null default…
user_id | bigint | not null
device_type | integer | not null
logged_in | timestamp without time zone | not null
logged_in_string | character varying(255) |
Not Null
Redshift Not Null
information_schema.columns
NULL
S3 to Glue
GlueContext.create_dynamic_frame.from_catalog
Glue
GlueContext.create_dynamic_frame.from_options
S3
S3 key
pyspark.sql.functions.input_file_name()
from_catalog
Glue to Redshift
GlueContext.write_dynamic_frame.from_jdbc_conf
COPY
spark-redshift
DataFrame
information_schema
DynamicFrame DropNullFields
COPY
TRUNCATECOLUMNS
Job Bookmark
Job Bookmark
Job Bookmark
S3
Redshift
Glue
CloudWatch Logs /aws-glue/jobs/error
Glue 10
Trigger
1 1 3
information_schema.columns
Redshift
Redshift SELECT
Job Bookmark
1 1 Max concurrency = 1
Aggregator
S3 (store)
Elasticsearch (view)
Redshift (view/store) Redash (view)
Servers /
Applications
Kinesis Data Streams Lambda DynamoDB Lambda
SQS
ElastiCache
EC2 (Poll) DynamoDB Lambda ElastiCache
Athena (view)
Kinesis Data Firehose S3 Lambda S3 Glue
Personalized Feed
Feed Feed
Aurora
AWSサービスで実現するEightの行動ログ活用基盤
AWSサービスで実現するEightの行動ログ活用基盤

AWSサービスで実現するEightの行動ログ活用基盤