김준형 / AWS 솔루션즈 아키텍트
김성규 선임 / 삼성전자 서버엔지니어
Amazon DynamoDB 기반
글로벌 서비스 개발 방법
목차
• Amazon DynamoDB Overview
• Amazon DynamoDB 새로운 기능
• 저장 시 암호화
• 백업 & 복구
• 글로벌 테이블
• 고객 사례 (삼성전자, Moving a Galaxy into the Cloud)
Amazon DynamoDB
Overview
Amazon DynamoDB 특징
빠르고 일정한 성능
10ms 미만의 지연시간 보장
DAX로 micro초로 감소
높은 확장성
초당 수 백만개 요청 처리
수 백TB 용량 자동 확장
완전 관리형 서비스
자동 프로비저닝 및
인프라 관리
규모와 상관없이 빠르고 유연한 NoSQL 데이터베이스 서비스
높은 신뢰성
리전 내 여러 AZ로
데이터 복제,
세분화된 접근제어
Amazon DynamoDB 활용 고객 (100,000 +)
2017년
새로 추가된 기능
2017년 추가된 기능
VPC
엔드포인트
2017년 4월
Auto
Scaling
2017년 7월
DynamoDB
Accelerator(DAX)
2017년 4월
Time to
Live (TTL)
2017년 2월
글로벌 테이블
(출시)
온디맨드
백업(출시)
저장시 암호화
(출시 예정)
NEW
!
2017
Time-to-live (TTL)
ID Name Size Expiry
1234 A 100 1456702305
2222 B 240 1456702400
3423 C 150 1459207905
• 만료 타임스탬프 기반, 테이블에서 아이템들을 자동 삭제
• Epoch 시간형식으로 사용자 정의 TTL 속성
• TTL 활동은 DynamoDB Streams에 저장
TTL 속성
기능
주요 이점
• 더 이상 필요없는 아이템을 삭제하여 비용 절감
• 테이블 크기의 증가를 제어, 어플리케이션 성능 최적화
• DynamoDB Stream와 Lambda로 사용자 정의 워크플로우
트리거
Auto Scaling
$$$ 비용절감
완전 관리형이며 자동으로
필요할 때 스케일 업
필요하지 않을 때 스케일 다운.
기본적으로 활성화 되어 있음.
예약된 Auto Scaling 기능 출시
Amazon Virtual Private Cloud (VPC) 엔드포인트
• DynamoDB에 VPC 엔드포인트로 안전한 접근
• 독립된 IAM 롤과 권한으로 VPC 엔드포인트를
통한 테이블 접근의 제어
기능
주요 이점
• 퍼블릭 인터넷 게이트웨이를 통한 접근 통제,
프라이버시와 보안을 향상
• VPC와 DynamoDB 사이에 안전하고 빠른
데이터 전송
DynamoDB Accelerator (DAX)
• 완전 관리형, 고가용성 : 리전 내 여러 AZ간의 복제함성
• DynamoDB API 호환: DynamoDB API 호출 캐시하여
어플리케이션을 재작성 불필요
• 유연성 : 하나 이상의 테이블에 DAX를 구성
• 확장성 : 최대 10개의 읽기 복제본으로 스케일 아웃 대응
• 보안 : Amazon VPC, AWS IAM / CloudTrail/ Organizations
• 관리효율성 : Amazon CloudWatch, DynamoDB 태그,
AWS Console 와 완전 통합
기능
DynamoDB
Your Applications
DynamoDB Accelerator
Table #1
Table #2
어플리케이션
DAX : 인메모리 성능과 스루풋으로 빠른 응답
Milliseconds to
microseconds
평균 5 ms
평균 0.2 ms
AWS re:Invent 2017
발표된 새로운 기능
백업 & 복구
이제까지는…
 DataPipeline 및 오픈 소스 도구 활용
 Amazon EMR 을 이용한 복제
 리전간 복제 기능을 활용한 데이터 이동
추가 리소스 및 구성 필요
온디맨드 & 지속적 백업 제공 최초의 NoSQL DB
장기간 데이터 저장
컴플라이언스 준수
온디맨드 백업
단기간 데이터
Point In Time Recovery
데이터 손상 방지
(2018년 초 출시 예정)
수 백TB를 성능에
영향 없이 즉시 백업
NEW
!
DynamoDB 글로벌 테이블
완전 관리형, 멀티 마스터, 멀티 리전 데이터베이스
 고성능, 글로벌로 배포된 어플리케이션
 로컬에서 낮은 읽기/쓰기 지연 속도 보장
 멀티 리전 복제로 재해 복구
 손쉬운 설정 및 어플리케이션 수정 불필요
 현재 US East(2)/West(1) 및 EU(2) 사용가능
전세계 유저들
Global Table
NEW
!
저장 시 암호화(Encryption at Rest) (출시 예정)
서버 측 암호화
어플리케이션의
코드를 재작성할
필요 없음
HIPPA, PCI, ISO 등
컴플라이언스
인증을 지원
Amazon
DynamoDB
AWS
KMS
민감한 데이터
APP1 APP2 APP3
NEW
!
고객 사례(삼성전자)
Moving a Galaxy into the Cloud
Best Practices from Samsung on Migrating to Amazon DynamoDB
Samsung Cloud Service
• Storage service providing backup and restore and key value store to mobile application
Backup and restore
data and settings
Your photos on
multiple
devices any time
15 GB of free storage,
Upgrade for more
Home screen, App data, Contact,
Messages, Device settings, Music,
Documents, etc.
Sync photos, videos, notes using
native applications across Samsung
devices
US, 29 countries in EU, KR, etc.
DynamoDB Usage Today
DynamoDB usage is growing steadily
500 tables, 3.5M RCU, 3M WCU, 860TB Storage in total
Growth rate (YoY) - RCU: 136%, WCU: 175%, Storage: 226% (2017. 7)
NoSQL Database Usage in 2014
Cassandra Cluster
• Cassandra ring : > 100 i2.8xlarge instances (50% On-demand, 50% Reserved Instance)
Challenges
• High cost of operations and resources
• Unstable consistency
Requirements
• Providing indexes
• Real-time query response
• Enables large data size
• Efficient and fast scale-out
• Easy and secure operation
? DynamoDB
Phase 1: Evaluation
Scalability
Requirements
• > 20K concurrent connections, > 100TB table size
DynamoDB
• Do not monitor and limit connections count. Just limit throughputs by user’s input
• There are no limits on the request capacity or storage size for a given table
Phase 1: Evaluation
Performance
Requirements
• Consistent latency at scale
DynamoDB
• No storage capacity limitation, no latency performance impact from large amounts of data and
transactions
Phase 1: Evaluation
Reliability
Requirements
• Amazon S3 level availability & durability, Backup and Recovery
DynamoDB
• Fault tolerance in the event of a server failure or Availability Zone outage
• Synchronously replicates data across three facilities within an AWS Region
• Export / Import as a full backup
Phase 1: Evaluation
Security
Requirements
• Data encryption at rest, DB access logs
DynamoDB
• DynamoDB is already secure enough with AWS-managed infrastructure, IAM for access control,
encryption in transit.
• Client-side encryption with KMS
Phase 1: Evaluation
Cost
Instance Type Spec
• i2.8xlarge : 365K Read IOPS, 315K First Write IOPS (with 4KB blocksize)
• c4.8xlarge : 48K IOPS (with 16KB blocksize), 500Mbps
Usage
• i2.8xlarge, 10K Read IOPS, 2K Write IOPS (with 40KB)
Results
• Changed i2.8xlarge instances to c4.8xl instances with 8 x 1TB EBS GP2 volumes
• i2.8xlarge $5,500/mon vs c4.8xlarge + gp2 $2,568/mon
• > 50% cost savings
Lessons
• As data grows, instances are used for storage capacity rather than IOPS. Opportunity to optimize
Phase 1: Evaluation (Contd.)
Cost Comparison
Capacity
• Total = Used 230TB / Physical 512TB (800GB * 8 instance store volumes * 80 instances)
• DynamoDB indexed data storage capacity = 80TB ( = 230TB / 3 replication factor)
Calculate Capacity Unit
• 43K reads / second = 3,700M calls per day
• 14K writes / second = 1,250M calls per day
• 43K RCU, 14K WCU (Item size < 1KB, strong consistency)
Cost
• > 90% cost savings without RI/RC
• not realized, 60~70% cost savings in reality
Lessons
• Provisioned throughputs might not be 100% utilized. Consider % utilization for cost comparisons
Phase 2: Testing
YCSB (Yahoo! Cloud Serving Benchmark)
• Open source benchmark tool for NoSQL : DynamoDB, Cassandra, Couchbase, etc
• Wiki: https://github.com/brianfrankcooper/YCSB
Core Workloads
• Sets of pre-defined properties: readproportion, insertproportion, requestdistribution,
operationcount
• Workload A: Update heavy workload (50/50 reads and writes)
• Workload B: Read mostly workload (95/5 reads/write)
• Workload C: Read only (100% read)
• Workload D: Read latest workload
• Workload E: Short ranges
• Workload F: Read-modify-write
Phase 2: Testing
Test Results (with Item counts : 100M, Item size < 1KB, 4 clients)
• Increased throughputs from 40K to 80K,
but DynamoDB performance does not scale for workload A,B,C,F.
RCU/WCU : 80K
(Strong Consistency)
Workload
A
Workload
B
Workload
C
Workload
F
Workload
D
Workload
E
Consumed Read Capacity 7,000 14,000 14,000 14,000 80,000 22,000
Consumed Write Capacity 14,500 1,500 - 14,000 9,000 207
RCU/WCU : 40K
(Strong Consistency)
Workload
A
Workload
B
Workload
C
Workload
F
Workload
D
Workload
E
Consumed Read Capacity 8,000 14,000 14,000 14,000 40,000 -
Consumed Write Capacity 16,500 1,500 - 14,600 4,500 -
Phase 2: Testing
Core Workloads: sets of core properties
• Workload A: Update heavy workload, requestdistribution=zipfian
• Workload B: Read mostly workload, requestdistribution=zipfian
• Workload C: Read only, requestdistribution=zipfian
• Workload D: Read latest workload, requestdistribution=latest
• Workload E: Short ranges, requestdistribution=zipfian
• Workload F: Read-modify-write, requestdistribution=zipfian
Zipfian distribution
• Distribution requests popular items more
• Doesn’t spread workloads across partitions in DynamoDB
• Small table with small no. of partitions doesn’t matter (with 1K RCU, WCU)
Phase 2: Testing
Recommended workload distribution set to ‘uniform’
• https://github.com/brianfrankcooper/YCSB/tree/master/dynamodb
• Best practices for using DynamoDB - uniform, evenly distributed workload is the recommended
pattern for scaling and getting predictable performance out of DynamoDB
Phase 3: Design – Architecture
Record type sync server
Cassandra
Cluster
S3 for
large size data
Memcached
for user lock
Mobile Client
External ELB
Sync API Server
S3 for
large size data
DynamoDB RDS for
scheme info
Mobile Client
External ELB
Sync API Server
RDS for
scheme info
Phase 3: Design – Table
Function tables
• Partition key : composite_key
• Sort key : record_id
• A single big table to put and query once
Composite key : user_id + service_id + unique_id
Random alphanumeric
Unique values per service
Unique values
per record
Partition key Sort key Attributes…
…
…
…
…
Phase 3: Design – Table
Contents tables
• Partition key : user_id
• Sort key : record_id
• > 70 small tables per services to provision
different throughputs for each services
e.g.)
• More popular table ( > 50TB)
: 45K WCU / 200K RCU
• Less popular ( < 4GB)
: 50 WCU / 300 RCU
user_id
Partition key Sort key Attributes…
…
…
…
…
record_id
Local Secondary Index
Partition key Sort key Attributes
user_id update_time record_id
Phase 4: Data Migration
Online migration
• Full migration is not possible (several 100s of TB sized tables) : Per user migration
• Some users are in Cassandra while others are in DynamoDB : Storage path DB
• To minimize impact for each user, migrate as soon as possible : Accelerate migration
User Storage Path
User A <path_to_cassandra>
User B <path_to_dynamodb>
DynamoDB
App Servers
Mobile
clients
Storage
path DB
User A User B
Normal data flow
for migrated users
Normal data flow
for not migrated users
Migration data flow
Phase 4: Data Migration
Low utilization
• Simultaneous write calls and Batch deletion in migration: spiky workload pattern
• For instance, a user could hit 2~3 partitions only for writing 1K~10K items
Partition #1
Partition #2
Partition #3
…
Partition #N
Hitting only few partitions
with multiple threads
User A’s i
tems
Phase 4: Data Migration
Solutions
• Decrease thread count for proper load control
• Reduce the number of items to read and write at once
• Improvements: Reduced throttled events > 90%, Increased utilization from ~45% to ~80%,
Accelerated migration speed
Phase 5: Operation
Diluted partitions
• As storage capacity grows and provision throughputs, partitions grow automatically
• But do not shrink
• Avoid diluted partitions situations
Partition #1
Partition #2
Partition #3
…
Partition
#1,000
Partition #1
Partition #2
Partition #3
…
Partition
#1,000
Partition #1
Create table
Migration After
Increase WCU
to 1,000K
Decrease WCU
to 100K
1,000 WCU
per partition
X
1,000 partitions
100 WCU
per partition
X
1,000 partitions
1/10 per partition
WCU
Phase 5: Operation
Data Backup
• Backup requirements – RPO 1day, RTO 7 days for internal SLA
• Export / Import is not supported in SIN (as Data Pipeline is not available)
• Large table (> 40 TB) backup fails using Export / Import
Solutions
• Daily full backup with custom backup scripts with EMR/Hive: still challenge for cost
• Now available DynamoDB Backup and Restore
After Migration
Results
Feb. 2015 Aug. 2015 Sep. 2015 Oct. 2015 Feb. 2016
Evaluation
(7 months)
Testing
(1 month)
Modeling
(1 month)
Migration
(4 months)
Operation
( > 2 years)
Cassandra
i2.8xlarge
> 100ea
DynamoDB
> 250K WCU
> 1M RCU
~40% Cost savings
Excludes transfers, services costs, etc.
Dec. 2015 Dec. 2017
Realized huge cost savings by migrating to DynamoDB!
Benefits of using DynamoDB
• Successfully launched Samsung Cloud service supporting massive scale workloads for
Samsung Galaxy Smart Phones
• 40% cost saving in NoSQL infrastructure cost
• No capacity planning for Peta-byte Scale Storage Capacity with on-demand capacity
• Consistent performance at 10s of Millions of Throughputs
• Zero administration for Hundreds of Tables with DynamoDB Auto Scaling
• No failures during 2 years Operation with Fully Managed Service
• No data corruption or loss for Billions of Items
• Enterprise Level Security and Compliance using VPC Endpoints for DynamoDB
Lessons learned
Evaluation
• TCO drives technology adoption / innovation
• As data grows, instances are used for storage capacity rather than IOPS. Chances to optimize.
• Provisioned throughputs might not be 100% utilized. Consider utilization
Table Design
• Both “the primary key selection” and “The workload patterns on individual items” are important
Data Migration
• Enable online migration by migrating per user using “storage path DB”
• Handle spiky workloads
• Go back to table design to spread workloads across partitions
Operations
• Avoid Diluted Partitions
• Amazon DynamoDB의 새로운 기능 확인하기
 Global Table (전역 테이블)
https://aws.amazon.com/ko/dynamodb/global-tables/
 Ondemand Back up & Restore (온디맨드 백업)
https://aws.amazon.com/ko/dynamodb/backup-restore/
 AutoScaling (오토스케일링)
https://docs.aws.amazon.com/ko_kr/amazondynamodb/latest/developerguide/AutoScaling.
html
 TTL(Time To Live)
https://docs.aws.amazon.com/ko_kr/amazondynamodb/latest/developerguide/TTL.html
 VPC Endpoint
https://docs.aws.amazon.com/ko_kr/amazondynamodb/latest/developerguide/vpc-
endpoints-dynamodb.html
 DAX(DynamoDB Acclerator)
https://docs.aws.amazon.com/ko_kr/amazondynamodb/latest/developerguide/DAX.html
본 강연이 끝난 후…
감사합니다

AWS CLOUD 2018- Amazon DynamoDB기반 글로벌 서비스 개발 방법 (김준형 솔루션즈 아키텍트)

  • 1.
    김준형 / AWS솔루션즈 아키텍트 김성규 선임 / 삼성전자 서버엔지니어 Amazon DynamoDB 기반 글로벌 서비스 개발 방법
  • 2.
    목차 • Amazon DynamoDBOverview • Amazon DynamoDB 새로운 기능 • 저장 시 암호화 • 백업 & 복구 • 글로벌 테이블 • 고객 사례 (삼성전자, Moving a Galaxy into the Cloud)
  • 3.
  • 4.
    Amazon DynamoDB 특징 빠르고일정한 성능 10ms 미만의 지연시간 보장 DAX로 micro초로 감소 높은 확장성 초당 수 백만개 요청 처리 수 백TB 용량 자동 확장 완전 관리형 서비스 자동 프로비저닝 및 인프라 관리 규모와 상관없이 빠르고 유연한 NoSQL 데이터베이스 서비스 높은 신뢰성 리전 내 여러 AZ로 데이터 복제, 세분화된 접근제어
  • 5.
    Amazon DynamoDB 활용고객 (100,000 +)
  • 6.
  • 7.
    2017년 추가된 기능 VPC 엔드포인트 2017년4월 Auto Scaling 2017년 7월 DynamoDB Accelerator(DAX) 2017년 4월 Time to Live (TTL) 2017년 2월 글로벌 테이블 (출시) 온디맨드 백업(출시) 저장시 암호화 (출시 예정) NEW ! 2017
  • 8.
    Time-to-live (TTL) ID NameSize Expiry 1234 A 100 1456702305 2222 B 240 1456702400 3423 C 150 1459207905 • 만료 타임스탬프 기반, 테이블에서 아이템들을 자동 삭제 • Epoch 시간형식으로 사용자 정의 TTL 속성 • TTL 활동은 DynamoDB Streams에 저장 TTL 속성 기능 주요 이점 • 더 이상 필요없는 아이템을 삭제하여 비용 절감 • 테이블 크기의 증가를 제어, 어플리케이션 성능 최적화 • DynamoDB Stream와 Lambda로 사용자 정의 워크플로우 트리거
  • 9.
    Auto Scaling $$$ 비용절감 완전관리형이며 자동으로 필요할 때 스케일 업 필요하지 않을 때 스케일 다운. 기본적으로 활성화 되어 있음. 예약된 Auto Scaling 기능 출시
  • 10.
    Amazon Virtual PrivateCloud (VPC) 엔드포인트 • DynamoDB에 VPC 엔드포인트로 안전한 접근 • 독립된 IAM 롤과 권한으로 VPC 엔드포인트를 통한 테이블 접근의 제어 기능 주요 이점 • 퍼블릭 인터넷 게이트웨이를 통한 접근 통제, 프라이버시와 보안을 향상 • VPC와 DynamoDB 사이에 안전하고 빠른 데이터 전송
  • 11.
    DynamoDB Accelerator (DAX) •완전 관리형, 고가용성 : 리전 내 여러 AZ간의 복제함성 • DynamoDB API 호환: DynamoDB API 호출 캐시하여 어플리케이션을 재작성 불필요 • 유연성 : 하나 이상의 테이블에 DAX를 구성 • 확장성 : 최대 10개의 읽기 복제본으로 스케일 아웃 대응 • 보안 : Amazon VPC, AWS IAM / CloudTrail/ Organizations • 관리효율성 : Amazon CloudWatch, DynamoDB 태그, AWS Console 와 완전 통합 기능 DynamoDB Your Applications DynamoDB Accelerator Table #1 Table #2 어플리케이션
  • 12.
    DAX : 인메모리성능과 스루풋으로 빠른 응답 Milliseconds to microseconds 평균 5 ms 평균 0.2 ms
  • 13.
  • 14.
    백업 & 복구 이제까지는… DataPipeline 및 오픈 소스 도구 활용  Amazon EMR 을 이용한 복제  리전간 복제 기능을 활용한 데이터 이동 추가 리소스 및 구성 필요
  • 15.
    온디맨드 & 지속적백업 제공 최초의 NoSQL DB 장기간 데이터 저장 컴플라이언스 준수 온디맨드 백업 단기간 데이터 Point In Time Recovery 데이터 손상 방지 (2018년 초 출시 예정) 수 백TB를 성능에 영향 없이 즉시 백업 NEW !
  • 16.
    DynamoDB 글로벌 테이블 완전관리형, 멀티 마스터, 멀티 리전 데이터베이스  고성능, 글로벌로 배포된 어플리케이션  로컬에서 낮은 읽기/쓰기 지연 속도 보장  멀티 리전 복제로 재해 복구  손쉬운 설정 및 어플리케이션 수정 불필요  현재 US East(2)/West(1) 및 EU(2) 사용가능 전세계 유저들 Global Table NEW !
  • 17.
    저장 시 암호화(Encryptionat Rest) (출시 예정) 서버 측 암호화 어플리케이션의 코드를 재작성할 필요 없음 HIPPA, PCI, ISO 등 컴플라이언스 인증을 지원 Amazon DynamoDB AWS KMS 민감한 데이터 APP1 APP2 APP3 NEW !
  • 18.
    고객 사례(삼성전자) Moving aGalaxy into the Cloud Best Practices from Samsung on Migrating to Amazon DynamoDB
  • 19.
    Samsung Cloud Service •Storage service providing backup and restore and key value store to mobile application Backup and restore data and settings Your photos on multiple devices any time 15 GB of free storage, Upgrade for more Home screen, App data, Contact, Messages, Device settings, Music, Documents, etc. Sync photos, videos, notes using native applications across Samsung devices US, 29 countries in EU, KR, etc.
  • 20.
    DynamoDB Usage Today DynamoDBusage is growing steadily 500 tables, 3.5M RCU, 3M WCU, 860TB Storage in total Growth rate (YoY) - RCU: 136%, WCU: 175%, Storage: 226% (2017. 7)
  • 21.
    NoSQL Database Usagein 2014 Cassandra Cluster • Cassandra ring : > 100 i2.8xlarge instances (50% On-demand, 50% Reserved Instance) Challenges • High cost of operations and resources • Unstable consistency Requirements • Providing indexes • Real-time query response • Enables large data size • Efficient and fast scale-out • Easy and secure operation ? DynamoDB
  • 22.
    Phase 1: Evaluation Scalability Requirements •> 20K concurrent connections, > 100TB table size DynamoDB • Do not monitor and limit connections count. Just limit throughputs by user’s input • There are no limits on the request capacity or storage size for a given table
  • 23.
    Phase 1: Evaluation Performance Requirements •Consistent latency at scale DynamoDB • No storage capacity limitation, no latency performance impact from large amounts of data and transactions
  • 24.
    Phase 1: Evaluation Reliability Requirements •Amazon S3 level availability & durability, Backup and Recovery DynamoDB • Fault tolerance in the event of a server failure or Availability Zone outage • Synchronously replicates data across three facilities within an AWS Region • Export / Import as a full backup
  • 25.
    Phase 1: Evaluation Security Requirements •Data encryption at rest, DB access logs DynamoDB • DynamoDB is already secure enough with AWS-managed infrastructure, IAM for access control, encryption in transit. • Client-side encryption with KMS
  • 26.
    Phase 1: Evaluation Cost InstanceType Spec • i2.8xlarge : 365K Read IOPS, 315K First Write IOPS (with 4KB blocksize) • c4.8xlarge : 48K IOPS (with 16KB blocksize), 500Mbps Usage • i2.8xlarge, 10K Read IOPS, 2K Write IOPS (with 40KB) Results • Changed i2.8xlarge instances to c4.8xl instances with 8 x 1TB EBS GP2 volumes • i2.8xlarge $5,500/mon vs c4.8xlarge + gp2 $2,568/mon • > 50% cost savings Lessons • As data grows, instances are used for storage capacity rather than IOPS. Opportunity to optimize
  • 27.
    Phase 1: Evaluation(Contd.) Cost Comparison Capacity • Total = Used 230TB / Physical 512TB (800GB * 8 instance store volumes * 80 instances) • DynamoDB indexed data storage capacity = 80TB ( = 230TB / 3 replication factor) Calculate Capacity Unit • 43K reads / second = 3,700M calls per day • 14K writes / second = 1,250M calls per day • 43K RCU, 14K WCU (Item size < 1KB, strong consistency) Cost • > 90% cost savings without RI/RC • not realized, 60~70% cost savings in reality Lessons • Provisioned throughputs might not be 100% utilized. Consider % utilization for cost comparisons
  • 28.
    Phase 2: Testing YCSB(Yahoo! Cloud Serving Benchmark) • Open source benchmark tool for NoSQL : DynamoDB, Cassandra, Couchbase, etc • Wiki: https://github.com/brianfrankcooper/YCSB Core Workloads • Sets of pre-defined properties: readproportion, insertproportion, requestdistribution, operationcount • Workload A: Update heavy workload (50/50 reads and writes) • Workload B: Read mostly workload (95/5 reads/write) • Workload C: Read only (100% read) • Workload D: Read latest workload • Workload E: Short ranges • Workload F: Read-modify-write
  • 29.
    Phase 2: Testing TestResults (with Item counts : 100M, Item size < 1KB, 4 clients) • Increased throughputs from 40K to 80K, but DynamoDB performance does not scale for workload A,B,C,F. RCU/WCU : 80K (Strong Consistency) Workload A Workload B Workload C Workload F Workload D Workload E Consumed Read Capacity 7,000 14,000 14,000 14,000 80,000 22,000 Consumed Write Capacity 14,500 1,500 - 14,000 9,000 207 RCU/WCU : 40K (Strong Consistency) Workload A Workload B Workload C Workload F Workload D Workload E Consumed Read Capacity 8,000 14,000 14,000 14,000 40,000 - Consumed Write Capacity 16,500 1,500 - 14,600 4,500 -
  • 30.
    Phase 2: Testing CoreWorkloads: sets of core properties • Workload A: Update heavy workload, requestdistribution=zipfian • Workload B: Read mostly workload, requestdistribution=zipfian • Workload C: Read only, requestdistribution=zipfian • Workload D: Read latest workload, requestdistribution=latest • Workload E: Short ranges, requestdistribution=zipfian • Workload F: Read-modify-write, requestdistribution=zipfian Zipfian distribution • Distribution requests popular items more • Doesn’t spread workloads across partitions in DynamoDB • Small table with small no. of partitions doesn’t matter (with 1K RCU, WCU)
  • 31.
    Phase 2: Testing Recommendedworkload distribution set to ‘uniform’ • https://github.com/brianfrankcooper/YCSB/tree/master/dynamodb • Best practices for using DynamoDB - uniform, evenly distributed workload is the recommended pattern for scaling and getting predictable performance out of DynamoDB
  • 32.
    Phase 3: Design– Architecture Record type sync server Cassandra Cluster S3 for large size data Memcached for user lock Mobile Client External ELB Sync API Server S3 for large size data DynamoDB RDS for scheme info Mobile Client External ELB Sync API Server RDS for scheme info
  • 33.
    Phase 3: Design– Table Function tables • Partition key : composite_key • Sort key : record_id • A single big table to put and query once Composite key : user_id + service_id + unique_id Random alphanumeric Unique values per service Unique values per record Partition key Sort key Attributes… … … … …
  • 34.
    Phase 3: Design– Table Contents tables • Partition key : user_id • Sort key : record_id • > 70 small tables per services to provision different throughputs for each services e.g.) • More popular table ( > 50TB) : 45K WCU / 200K RCU • Less popular ( < 4GB) : 50 WCU / 300 RCU user_id Partition key Sort key Attributes… … … … … record_id Local Secondary Index Partition key Sort key Attributes user_id update_time record_id
  • 35.
    Phase 4: DataMigration Online migration • Full migration is not possible (several 100s of TB sized tables) : Per user migration • Some users are in Cassandra while others are in DynamoDB : Storage path DB • To minimize impact for each user, migrate as soon as possible : Accelerate migration User Storage Path User A <path_to_cassandra> User B <path_to_dynamodb> DynamoDB App Servers Mobile clients Storage path DB User A User B Normal data flow for migrated users Normal data flow for not migrated users Migration data flow
  • 36.
    Phase 4: DataMigration Low utilization • Simultaneous write calls and Batch deletion in migration: spiky workload pattern • For instance, a user could hit 2~3 partitions only for writing 1K~10K items Partition #1 Partition #2 Partition #3 … Partition #N Hitting only few partitions with multiple threads User A’s i tems
  • 37.
    Phase 4: DataMigration Solutions • Decrease thread count for proper load control • Reduce the number of items to read and write at once • Improvements: Reduced throttled events > 90%, Increased utilization from ~45% to ~80%, Accelerated migration speed
  • 38.
    Phase 5: Operation Dilutedpartitions • As storage capacity grows and provision throughputs, partitions grow automatically • But do not shrink • Avoid diluted partitions situations Partition #1 Partition #2 Partition #3 … Partition #1,000 Partition #1 Partition #2 Partition #3 … Partition #1,000 Partition #1 Create table Migration After Increase WCU to 1,000K Decrease WCU to 100K 1,000 WCU per partition X 1,000 partitions 100 WCU per partition X 1,000 partitions 1/10 per partition WCU
  • 39.
    Phase 5: Operation DataBackup • Backup requirements – RPO 1day, RTO 7 days for internal SLA • Export / Import is not supported in SIN (as Data Pipeline is not available) • Large table (> 40 TB) backup fails using Export / Import Solutions • Daily full backup with custom backup scripts with EMR/Hive: still challenge for cost • Now available DynamoDB Backup and Restore
  • 40.
    After Migration Results Feb. 2015Aug. 2015 Sep. 2015 Oct. 2015 Feb. 2016 Evaluation (7 months) Testing (1 month) Modeling (1 month) Migration (4 months) Operation ( > 2 years) Cassandra i2.8xlarge > 100ea DynamoDB > 250K WCU > 1M RCU ~40% Cost savings Excludes transfers, services costs, etc. Dec. 2015 Dec. 2017 Realized huge cost savings by migrating to DynamoDB!
  • 41.
    Benefits of usingDynamoDB • Successfully launched Samsung Cloud service supporting massive scale workloads for Samsung Galaxy Smart Phones • 40% cost saving in NoSQL infrastructure cost • No capacity planning for Peta-byte Scale Storage Capacity with on-demand capacity • Consistent performance at 10s of Millions of Throughputs • Zero administration for Hundreds of Tables with DynamoDB Auto Scaling • No failures during 2 years Operation with Fully Managed Service • No data corruption or loss for Billions of Items • Enterprise Level Security and Compliance using VPC Endpoints for DynamoDB
  • 42.
    Lessons learned Evaluation • TCOdrives technology adoption / innovation • As data grows, instances are used for storage capacity rather than IOPS. Chances to optimize. • Provisioned throughputs might not be 100% utilized. Consider utilization Table Design • Both “the primary key selection” and “The workload patterns on individual items” are important Data Migration • Enable online migration by migrating per user using “storage path DB” • Handle spiky workloads • Go back to table design to spread workloads across partitions Operations • Avoid Diluted Partitions
  • 43.
    • Amazon DynamoDB의새로운 기능 확인하기  Global Table (전역 테이블) https://aws.amazon.com/ko/dynamodb/global-tables/  Ondemand Back up & Restore (온디맨드 백업) https://aws.amazon.com/ko/dynamodb/backup-restore/  AutoScaling (오토스케일링) https://docs.aws.amazon.com/ko_kr/amazondynamodb/latest/developerguide/AutoScaling. html  TTL(Time To Live) https://docs.aws.amazon.com/ko_kr/amazondynamodb/latest/developerguide/TTL.html  VPC Endpoint https://docs.aws.amazon.com/ko_kr/amazondynamodb/latest/developerguide/vpc- endpoints-dynamodb.html  DAX(DynamoDB Acclerator) https://docs.aws.amazon.com/ko_kr/amazondynamodb/latest/developerguide/DAX.html 본 강연이 끝난 후…
  • 44.