Amazon SageMaker 모델 배포 방법 소개::김대근, AI/ML 스페셜리스트 솔루션즈 아키텍트, AWS::AWS AIML 스페셜 웨비나

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
AWS 전문가와 함께 익히는 모델 서빙 패턴
Amazon SageMaker
김대근
AIML Specialist Solutions Architect
AWS
Deployment

개요
2

밑바닥부터 모델 서빙을 구현하려면?
VPC
Availability Zone 1 Availability Zone 2
NAT gateway NAT gateway
Instance Instance
Instance Instance
Amazon EC2 Auto Scaling
• 인프라 설정
• 고가용성
• 프레임워크 버전 관리는?
• 트래픽이 몰린다면?
• A/B 테스트
• 보안을 고려하려면?
• 비용을 아끼고 싶은데?
• ….
Auto Scaling group
Auto Scaling group
CEO A: 슈퍼 울트라 엔지니어 OO명 외부 채용하자!
CEO B: 엔지니어 O명을 2주 동안 잘 교육시키자!

클라우드 네이티브 모델 서빙의 이점은 무엇일까요?
몇 분 안에 시작할 수 있는
엔드포인트endpoint
99.99%
서비스 가용성 SLA
CI/CD: SageMaker
파이프라인 및 프로젝트
인프라 관리, 패치 및
기본 제공 업데이트
70개 이상의 SageMaker
머신 러닝 인스턴스
모델 레지스트리Model Registry:
카탈로그 모델, 버전 관리, 승인 워크플로
Amazon CloudWatch에서
엔드포인트에 대한 지표 및
로그 수집
트래픽 기반 오토스케일링 모델 모니터Model Monitor:
데이터 및 모델 드리프트에 대한 경고
엔드포인트에 다중 모델 배포
손쉬운 모델 배포
및 관리
MLOps 통합
최고의 가격 대비
성능 절충안

SageMaker 추론inference 포트폴리오
Amazon SageMaker
빌트인
머신 러닝
프레임워크
모델 서버
머신 러닝
인스턴스
컴퓨팅
가속기
Amazon
SageMaker
SAGEMAKER STUDIO IDE
리얼타임
추론
비동기
추론
배치
추론
멀티 모델
엔드포인트
멀티 컨테이너
엔드포인트
추론
파이프라인
모델 버전
관리
CI/CD
모델
모니터링
CloudWatch
지표 및 로깅
AWS Deep Learning
Containers
TensorFlow Serving TorchServe
NVIDIA Triton
Inference Server
AWS Multi Model
Server (MMS)
Nginx + gunicorn
SageMaker
Neo
NVIDIA
TensorRT/
cuDNN
Intel
oneDNN
ARM
Compute
Library
CPUs GPUs Inferentia
Graviton
(ARM)

SageMaker 빌트인 모델 서빙 4가지 주요 패턴
리얼타임 추론 배치 추론 비동기 추론 서버리스 추론
SageMaker
빌트인 모델 서빙 패턴
• Low latency
• Ultra high throughput
• A/B testing
• Process large datasets
• Job-based system
• Near real-time
• Large payloads (1 GB)
• Long timeouts (15 min)
• Automatic scaling
• Pay-per-use pricing

빌트인 모델 서빙 동작 원리
1
Inference Container Image
SageMaker Model
ECR 또는 Private Docker Registry에 저장된
SageMaker 추론 이미지의 경로
배포를 위한 모델 패키징
모델Model 생성

8
1
Model Artifact
SageMaker Model
훈련된 모델 아티팩트에 대한 S3 경로
**SageMaker 빌트인 알고리즘에 필요
모델Model 생성

9
1
Model Artifact
Advanced Configurations
SageMaker Model
IAM Role
고급 구성 옵션은 선택한 배포 옵션에 따라 다릅니다.
예: VPC 구성, 다중 컨테이너 및 멀티 모델 배포.
모델Model 생성

10
모델Model 생성
1
Model Artifact
Advanced Configurations
SageMaker Model
IAM Role
엔드포인트 배포
2
Input
리얼타임 추론
배치 추론
비동기 추론
서버리스 추론

4가지 주요 패턴 살펴보기 –
리얼타임 추론
11

Baseline: 단일 모델Single Model
Auto Scaling group
Web
Serving
ML Instances
Container
HTTPS
Endpoint
Load
Balancing
Code Model
SageMaker Real-time Endpoint
Request
• 모델 추론에 필요한 모든 아티
팩트를 웹 서버에 저장
• 최대 6MB 페이로드에 대한
즉각적인 응답
• 60초 타임아웃
• 오토스케일링

엔드포인트 생성 3단계
aws sagemaker create-model
--model-name model1
--primary-container ‘{“Image”: “123.dkr.ecr.amazonaws.com/algo”,
“ModelDataUrl”: “s3://bkt/model1.tar.gz”}’
--execution-role-arn arn:aws:iam::123:role/me
aws sagemaker create-endpoint-config
--endpoint-config-name model1-config
--production-variants ‘{“InitialInstanceCount”: 2,
“InstanceType”: “ml.m4.xlarge”,
”InitialVariantWeight”: 1,
”ModelName”: “model1”,
”VariantName”: “AllTraffic”}’
aws sagemaker create-endpoint
--endpoint-name my-endpoint
모델Model 생성
엔드포인트 구성
EndpointConfig 생성
엔드포인트
Endpoint 생성

14
Model
FrameworkModel
TensorFlowModel
ML 프레임워크 모델을 정의하기 위한 일반 클래스입니다. 이 클래스는 S3에서 사용자 정의 코드를
호스팅하고 모델 환경 변수에서 코드 위치 및 구성을 설정합니다.
모델 객체를 정의하는 일반 클래스입니다. 모델 아티팩트를 서빙에 사용되는 컨테이너와 연결하고 배포를
실행합니다.
XGBoostModel
…
ML 프레임워크를 사용하기 위한 하위 클래스입니다. 환경 변수를 통해 특정 모델 제공
구성을 사용하고 프레임워크에 적절한 컨테이너를 설정합니다.
LinearLearnerModel
PCAModel
…
SageMaker 기본 제공 알고리즘으로 훈련된 모델 배포를 위한 전문 클래스입니다. 사용할
컨테이너 이미지를 적용합니다.
SageMaker SDK 핵심 클래스 - Model
from sagemaker.tensorflow import TensorFlowModel
model = TensorFlowModel(model_data='s3://mybucket/model.tar.gz’, role='MyRole’)
predictor = model.deploy(initial_instance_count=1, instance_type='ml.c5.xlarge’)

SageMaker SDK 핵심 클래스 - Predictor
from sagemaker.serializers import CSVSerializer
xgb_predictor = xgb.deploy(initial_instance_count=1,
instance_type='ml.m4.xlarge’)
xgb_predictor.serializer = CSVSerializer()
predictions = xgb_predictor.predict(inf_data).decode('utf-8')
--> Returns predictor object

SageMaker SDK 핵심 클래스 관계
Estimator
(Tensorflow)
Model Predictor
Define
Estimator
fit()
deploy()
Object created
predict()
Object created
Session
SageMaker 실행환경 (AWS 관리)
Training job Model
Inference
Endpoint

SageMaker SDK 엔드포인트 생성 과정 요약
predictor.predict(payload)
PyTorch
EC2
t3.xlarge
model = PyTorchModel(model_data=zipped_model_path,
role=get_execution_role(), framework_version='1.5',
entry_point='inference.py’, py_version='py3’,
predictor_cls=ImagePredictor)
predictor =
pytorch_model.deploy(instance_type='ml.t3.medium’,
initial_instance_count=1)
Create Model
Deploy
Predict
Creates endpoint
Runs prediction
Refers to Inference Container image
inference.py
1. model_fn() -> model load
2. input_fn() ->input processing
3. predict_fn() -> predictions
4. output_fn()-> output processing
Model
Artifacts
SageMaker framework
container images

SageMaker 빌트인 웹서버
Invocations (RESTful API)
MMS (Multi-Model
Server)
Model +
Inference Code
TorchServe
Model +
Inference Code
Nginx + gunicorn
Model +
Inference Code
TF
Serving
MXNet
PyTorch (<1.6.0)
Built-in algorithms (Multi-model)
XGBoost (Multi-model)
Scikit-learn (Multi-model)
PyTorch (>=1.6.0) Built-in algorithms
XGBoost
Scikit-learn
TensorFlow

멀티 모델 엔드포인트Multi-model Endpoint
ML Instances
MME Container
Endpoint
SageMaker Multi-model endpoint
• 단일 컨테이너에서 여러 모델
호스팅
• 타겟 모델에 대한 직접
호출direct invocations
• Amazon S3에서 동적으로
모델 로딩
• 콜드 스타트
Requests
model3.tar.gz model4.tar.gz
model1.tar.gz
model2.tar.gz
model3.tar.gz
model4.tar.gz
model1.tar.gz model2.tar.gz
TargetModel =
‘model3.tar.gz’
Mode: MultiModel
Load

멀티 컨테이너 엔드포인트multi-container Endpoint
ML Instances
Processing-1
Endpoint
Code Model
SageMaker Multi-container Endpoint (Serial)
• 최대 15개의 개별 컨테이너
호스팅 지원
• 직접 또는 직렬serial 호출
• 1GB 페이로드, 15분 타임아웃
• 콜드 스타트 없음
Processing-1
Code Model
Inference-2
Code Model
Inference-1
Code Model
Requests

Code snippets
container1 = {
'Image’: ecr-image1,
'ContainerHostname': 'firstContainer’}; ...
container2 = {
'Image’: ecr-image2,
'ContainerHostname': ‘secondtContainer’}; ...
sm.create_model(
InferenceExecutionConfig = {'Mode': 'Direct’},
Containers = [container1, container2, ...], ...)
sm.create_endpoint_config()
sm.create_endpoint()
smrt.invoke_endpoint(
EndpointName = endpoint_name,
TargetContainerHostname = 'firstContainer’,
Body = body, ...)
container = {
'Image’: mme-supported-image,
'ModelDataUrl': 's3://my-bucket/folder-of-tar-gz’,
'Mode': 'MultiModel’}
sm.create_model(
Containers = [container], ...)
sm.create_endpoint_config()
sm.create_endpoint()
smrt.invoke_endpoint(
EndpointName = endpoint_name,
TargetModel = 'model-007.tar.gz’,
Body = body, ...
멀티 모델 멀티 컨테이너

4가지 주요 패턴 살펴보기 –
배치 추론, 비동기 추론, 서버리스 추론
22

배치 변환Batch Transform
Agent
mini-
batching
ML Instances
Container
Amazon S3
Input
Code Model
SageMaker Batch Transform
• 전체 데이터셋에 대해 추론
• 대용량 데이터의 주기적
추론에 적합
• 임시 리소스Transient resources
(프로비저닝된 인스턴스는
작업 완료 후 곧바로 종료) →
사용한 만큼만 지불
Amazon S3
Output

비동기 추론 엔드포인트Asynchronous Inference Endpoint
Auto Scaling group
Web
Serving
ML Instances
Container
HTTPS
Endpoint
Internal
Queue
Code Model
SageMaker Asynchronous Inference Endpoint
• 최대 1GB의 대용량
페이로드에 이상적
• 최대 15분의 타임아웃
• 오토스케일링 (0개
인스턴스로 축소 가능)
• CV/NLP 유즈케이스에 적합
Amazon
SNS
Amazon S3
Input/output
Payload
업로드

람다 서버리스 추론Lambda Serverless inference
docker push
Amazon Elastic
Container Registry
(ECR)
컨테이너 이미지
ECR에 이미지 업로드
Invoke
Status: ACTIVE
호출invoke 준비
Lambda 함수
1. Amazon ECR에서 이미지 가져오기
2. 이미지 최적화
3. Lambda에 이미지 배포
CreateFunction
Status: PENDING
AWS Lambda
• 도커 이미지 빌드 및 람다 함수 구현 필요
람다 서버리스 추론

SageMaker 서버리스 추론
SageMaker Serverless
Endpoint
create_endpoint_config(
...
"ServerlessConfig": {
"MemorySizeInMB": 2048,
"MaxConcurrency": 20
}
)
response = sm_client.create_endpoint_config(
EndpointConfigName="[YOUR-ENDPOINT-CONFIG]"
ProductionVariants=[
{
"ModelName": "[YOUR-MODEL-NAME]",
"VariantName": "AllTraffic",
"ServerlessConfig": {
"MemorySizeInMB": 2048,
"MaxConcurrency": 20
}
}
]
)
사용 가능한 메모리 크기:
1GB/2GB/3GB/4GB/5GB/6GB
추론 컨테이너
이미지
ML 모델
모델Model 생성 엔드포인트 구성Endpoint Configuration 생성 엔드포인트 생성
Serverless 설정만 추가:
기존과 동일

프로덕션 적용
27

프로덕션 요구 사항
Product Owner
CDO/CTO
MLOps 엔지니어
데이터 과학자
70개 이상 인스턴스 유형들 중
어떤 유형을 사용해야 하나요?
트래픽이 몰릴 때나 적을 때 어떻
게 대처해야 하나요?
모델 드리프트/데이터 드리프트를
지속적으로 모니터링하고 싶어요.
무중단 배포를 쉽게 할 수 있나요?
로드 테스트를 위해 별도의 서드파티
툴킷을 사용해야 하나요?
CI/CD 파이프라인을 구성해야 해요.
복잡한 솔루션 & 구현이 아닌 간편한 방법을 원해요!

엔드포인트 업데이트
Endpoint
• 다운타임 제로
• 다양한 배포 전략
• 블루/그린 배포
• 카나리 롤아웃
• 쉐도우 배포 등
UpdateEndpoint
Docker Image (ECR)
Model Artifacts (S3)
model.tar.gz
── code
| ├── inference.py
| └── requirements.txt
└── model.pth
Docker Image (ECR)
Model Artifacts (S3)
model.tar.gz
── code
| ├── inference.py
| └── requirements.txt
└── model.pth
Instance Type
Instance Count
Variant
…
Instance Type
Instance Count
Variant
…
Endpoint Configuration 1 Model 1
Endpoint Configuration 2 Model 2

엔드포인트 업데이트
aws sagemaker create-model
--model-name model2
--primary-container ‘{“Image”: “123.dkr.ecr.amazonaws.com/algo”,
“ModelDataUrl”: “s3://bkt/model2.tar.gz”}
--execution-role-arn arn:aws:iam::123:role/me
aws sagemaker create-endpoint-config
--production-variants ‘{“InitialInstanceCount”: 2,
“InstanceType”: “ml.m4.xlarge”,
”InitialVariantWeight”: 1,
”ModelName”: “model2”,
”VariantName”: “AllTraffic”}’
aws sagemaker update-endpoint
--endpoint-name my-endpoint
모델Model 생성
엔드포인트 구성
EndpointConfig 생성
엔드포인트
Endpoint 수정

A/B 테스트
• 1~10개의 프로덕션
변형production variants
• 입출력 스키마가 동일해야 함
• 서비스 중단 없이 엔드포인트
수정
ML Instances
HTTPS
Endpoint
Load
Balancing
Code Model
SageMaker Real-time Endpoint
Request
Production Variant1 (Existing Model)
ML Instances
Code Model
Production Variant2 (New Model)
Weight = 10
Weight = 1
Amazon
Cloudwatch
UpdateEndpointWeightsAndCapacities

배포 가드레일Deployment guardrails 을 이용한 무중단 배포
• 2022년 1월 런칭된 신규 기능
• 완전 관리형 블루/그린 배포
전략 서비스
• 트래픽 비율 조정 정책 제공
(Canary, Linear)
• 에러 발생 시 기존 모델로 자동
롤백
Amazon
CloudWatch
Amazon
CloudWatch
Alarm
Endpoint v2
Endpoint v1
Router
75% 25%
에러 발생
(코드 에러,
프레임워크 버전
의존성 등)
https://aws.amazon.com/ko/blogs/machine-learning/take-advantage-of-advanced-
deployment-strategies-using-amazon-sagemaker-deployment-guardrails/

• 2022년 1월 런칭된 신규 기능
• 완전 관리형 블루/그린 배포
전략 서비스
• 트래픽 비율 조정 정책 제공
(Canary, Linear)
• 에러 발생 시 기존 모델로 자동
롤백
Amazon
CloudWatch
Amazon
CloudWatch
Alarm
Endpoint v2
Endpoint v1
Router
100%
https://aws.amazon.com/ko/blogs/machine-learning/take-advantage-of-advanced-
deployment-strategies-using-amazon-sagemaker-deployment-guardrails/
배포 가드레일Deployment guardrails 을 이용한 무중단 배포

오토스케일링Auto-scaling 엔드포인트
• 엔드포인트 인스턴스의 Amazon
CloudWatch 지표를 기반으로
자동 스케일링
• Min and max instances
• Target invocations per instance
• Scaling cooldowns
• 빌트인 & 커스텀 스케일링 정책

35
스케일링 옵션
TargetTrackingScaling Step scaling
Scheduled
scaling
On-demand
scaling
Amazon CloudWatch
측정항목을 기반으로
스케일링
고급 유형의 확장. 경보
위반의 크기에 따라
인스턴스를 동적으로
조정하는 추가 정책 정의
수요가 특정 일정을
따를 때 사용
1회성 일정, 반복
일정 또는 cron
표현식 지원
인스턴스 개수를
수동으로 조절

aws application-autoscaling register-scalable-target
--service-namespace sagemaker
--resource-id endpoint/my-endpoint/variant/model2
--scalable-dimension sagemaker:variant:DesiredInstanceCount
--min-capacity 2
--max-capacity 5
aws application-autoscaling put-scaling-policy
--policy-name model2-scaling
--service-namespace sagemaker
--resource-id endpoint/my-endpoint/variant/model2
--scalable-dimension sagemaker:variant:DesiredInstanceCount
--policy-type TargetTrackingScaling
--target-tracking-scaling-policy-configuration
‘{"TargetValue": 50,
"CustomizedMetricSpecification":
{"MetricName": "CPUUtilization",
"Namespace": "/aws/sagemaker/Endpoints",
"Dimensions":
[{"Name": "EndpointName", "Value": "my-endpoint"},
{"Name": "VariantName","Value": ”model2"}],
"Statistic": "Average",
"Unit": "Percent”}}’
오토스케일링
타겟 등록
스케일링 정책
생성

4가지 주요 패턴 요약
리얼타임 추론 배치 추론 비동기 추론 서버리스 추론
GPU 지 O O O X
오토스케일링 O N/A O O
Scale to Zero X N/A O O
멀티컨테이너 O X X X
멀티모델 O X X X
페이로드 크기 6MB 1GB 4MB
타임아웃 60초 N/A 15분 60초
블루그린 가드레일 O X X 1-step만 지원
PrivateLink 지원 O O O X
AB 테스트 (다중
Production Variants)
O X X X

비용 최적화
38

비용 최적화 요구 사항
Product Owner
CDO/CTO
HW 엔지니어
IoT 엔지니어
데이터 과학자
ML 엔지니어
한달만에 천만원이 과금되었어요
. 어떻게 비용을 아낄 수 있나요?
100ms 레이턴시를 만족하면서
한달에 10만원만 부담할 수 있는
방법이 있나요?
백만 개의 모델 디바이스에 모델을 배
포해야 해요. 라즈베리파이에서
30FPS가 보장되어야 합니다.
복잡한 솔루션 & 구현이 아닌 간편한 방법을 원해요!

엘라스틱 추론Elastic Inference
• 전체 GPU 인스턴스 비용의
일부만으로 추론 가속화
• CPU 인스턴스에 GPU 가속기 추가
• 추론 및 노트북 인스턴스와 함께
작동
• TensorFlow 및 MXNet 프레임워크
지원 (다른 프레임워크는 ONNX를
통해 사용 가능)
Sagemaker
Endpoint or
Notebook
instance
AWS
PrivateLink
Amazon
Elastic
Inference
VPC
Availability Zone

엘라스틱 추론 딥러닝 프레임워크
https://s3.console.aws.amazon.com/s3/buckets/amazonei-tensorflow/?region=us-east-
1&tab=objects
https://s3.console.aws.amazon.com/s3/buckets/amazonei-apachemxnet?region=us-
east-1&tab=objects#
https://amazonei-pytorcheia.s3.amazonaws.com/releases/v1.0.0/torcheia-1.0.0-cp36-
cp36m-manylinux1_x86_64.whl
AWS Deep Learning AMI (DLAMI), public ECR 도커 이미지, AWS 공용 S3 버킷 제공

비용 예시: 스트리밍 비디오 분석
• Instance: c5.xlarge
• EIA: eia2.medium
• Region: us-east-1
• c5.xlarge 의 시간당 요금: $0.17
• eia2.medium 의 시간당 요금: $0.12
• 총 시간당 요금: $0.29
• 총 월별 요금: $0.29 * 24 * 31 =
$215.76
• Instance: p2.xlarge
• Region: us-east-1
• p2.xlarge 의 시간당 요금: $0.90
• 총 시간당 요금: $0.90
• 총 월별 요금: $0.90 * 24 * 31 =
$669.6
CPU only 엘라스틱추론

PyTorch 코드 예시
PyTorch1.3.1 PyTorch 1.5.1
• TorchScript 생성
• model_fn()
• 모델 배포
# Required when using Elastic Inference
with torch.jit.optimized_execution(True,
{‘target_device’: ‘eia:0’}):
traced_model = torch.jit.trace(model, x)
input_shape = [1, 3, 224, 224]
input = torch.zeros(input_shape).float()
model = torch.jit.trace(model.eval(), input)
torch.jit.save(model, save_dir)
• TorchScript 생성
• model_fn()
• 모델 배포
model = torch.jit.load(save_dir,
map_location=torch.device('cpu’))
# Disable profiling executor
torch._C._jit_set_profiling_executor(False)
model = torch.jit.script(model.eval())
torch.jit.save(model, save_dir)
_ecr_image=“_763104351884.dkr.ecr.<region>.ama
zonaws.com/pytorch-inference-eia:<image_tag>"
pytorch_model.deploy(...,
image_uri=_ecr_image,
framework_version='1.5.1',
accelerator_type=＇ml.eia2.medium')
pytorch_model.deploy(...,
framework_version=‘1.3.1',
accelerator_type=＇ml.eia2.medium')

SageMaker Neo
Amazon
SageMaker Neo
Ambarella
ARM
Intel
NVIDIA
NXP
Qualcomm
Texas Instruments
Xilinx
Android
iOS
Linux
Windows
• 오픈소스 컴파일러
• DLR1 런타임 제공
• 유연성 (No ML/DL Runtime)
• SageMaker API 제공
• 추가 과금 없음
https://github.com/neo-ai
Framework OS HW
1. a compact, common runtime for deep learning models and decision tree models compiled by SageMaker Neo, TVM, or Treelite.

SageMaker Neo 컴파일 예시
• 여러 타겟 디바이스 동시 컴파일
가능
• 컴파일 시간 약 4~6분 소요
• Cost-free: AWS 리소스 사용에
대한 과금 없음
https://github.com/aws-samples/aiot-e2e-
sagemaker-greengrass-v2-nvidia-jetson

SageMaker Inference Recommender
MLOps 엔지니어 및 데이터 과학자를 위해 설계되어 모델을 프로덕션에 적용하는 시간을 단축합니다.
프로덕션 요구
사항(처리량throughput,
지연 시간latency)을
포함하는 광범위한
로드 테스트 실행
로드 테스트
프로덕션 요구 사항을
충족하는 엔드포인트
구성 설정
엔드포인트 추천
인스턴스 추천
초기 배포를 위한 머신
러닝 인스턴스 타입
추천

SageMaker Inference Recommender 동작 원리
Amazon
S3
추론 컨테이너
이미지
ML 모델
v1
Model Package
샘플 입력값 S3에 업로드
(payload.tar.gz)
v1
v2
SageMaker
Model Registry
…
모델
패키지
등록
SageMaker 호스팅
엔드포인트
SageMaker Inference
Recommender
다중 엔드포인트가 자동으로
생성되며 로드 테스트 수행
Recommendation Results
(1) API
(2) SageMaker Studio
describe_inference_recommendation_job(…)
또는
결괏값 확인 (API or Studio콘솔)

모델 서빙 꿀팁
체크리스트
• 트래픽이 많지 않을 경우에는 서버리스 추론이나 비동기 추론으로 시작하세요.
• 엔드포인트를 여러 개 띄우실 필요가 없습니다. (멀티 모델 엔드포인트, 멀티 컨테이너 엔드포인트)
디버깅!!
• 로컬 개발 환경에서 충분히 디버깅하세요
• 로컬 모드로 먼저 서빙 인프라를 구축하세요.
Latency
• 대용량 모델의 경우 Model distillation / Model Compilation / Quantization을 고려하세요.
• SageMaker Inference Recommender로 유즈케이스에 따른 최적 인스턴스를 파악합니다.
• SageMaker Neo / Inferentia / gRPC (TensorFlow only) 등의 대안을 같이 검토하세요.

Resources
SageMaker Inference
동작 원리 https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-deployment.html
개발자 가이드 https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-model.html
SageMaker Python SDK https://sagemaker.readthedocs.io/en/stable/overview.html
Workshop
워크샵 메인 페이지 https://main.dczm1kv9dpvdi.amplifyapp.com/
GitHub https://github.com/aws-samples/sm-model-serving-patterns
그 외 핸즈온
End-to-end AIoT on AWS https://github.com/aws-samples/aiot-e2e-sagemaker-greengrass-v2-nvidia-jetson
허깅페이스 MLOps w/ SageMaker https://github.com/daekeun-ml/sm-huggingface-kornlp
딥러닝 추론 샘플 https://github.com/aws-samples/sagemaker-inference-samples-kr
모델 서빙 CDK https://github.com/aws-samples/amazon-sagemaker-model-serving-using-aws-cdk

Thank you!
50
김대근
AIML Specialist Solutions Architect
daekeun@amazon.com

Amazon SageMaker 모델 배포 방법 소개::김대근, AI/ML 스페셜리스트 솔루션즈 아키텍트, AWS::AWS AIML 스페셜 웨비나

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Amazon SageMaker 모델 배포 방법 소개::김대근, AI/ML 스페셜리스트 솔루션즈 아키텍트, AWS::AWS AIML 스페셜 웨비나

Similar to Amazon SageMaker 모델 배포 방법 소개::김대근, AI/ML 스페셜리스트 솔루션즈 아키텍트, AWS::AWS AIML 스페셜 웨비나 (20)

More from Amazon Web Services Korea

More from Amazon Web Services Korea (20)

Amazon SageMaker 모델 배포 방법 소개::김대근, AI/ML 스페셜리스트 솔루션즈 아키텍트, AWS::AWS AIML 스페셜 웨비나