대규모 고성능 분산 컴퓨팅을 기반으로 구축된 알리바바 클라우드의 머신러닝 플랫폼 PAI에 대해 알아보세요. PAI는 고객이 대규모 데이터 마이닝 및 모델링을 쉽게 구현할 수 있도록 지원합니다.
중국 최초의 머신 러닝 플랫폼인 알리바바 클라우드 PAI는 AI 프로그램을 설계하기 위해 제작된 것으로, 여러 고객의 현실적인 문제를 해결하는 데 효과적인 도구입니다.
알리바바 클라우드 PAI의 주요 기능은 다음과 같습니다:
• 다양하고 혁신적인 알고리즘:PAI에는 데이터 전처리, 신경망, 회귀, 분류, 예측, 평가, 통계 분석, 기능 공학 및 딥러닝 아키텍처를 다루는 100가지 이상의 알고리즘이 설계되어 있습니다.
• 딥러닝 아키텍처: PAI에는 전체 컴퓨팅 아키텍처가 다양한 딥러닝 프레임워크에 맞게 최적화되어 있습니다. 또한 이는 API(Application Program Interface)를 배포하는 원클릭 기능을 지원해, 모델링과 서비스 통합 문제를 해결합니다.
• 대규모 컴퓨팅 파워: 알리바바 클라우드의 대형 컴퓨팅 엔진인 PAI는 Apsara에 의해 구동되며, 페타바이트급 컴퓨팅 업무를 매일 처리할 수 있는 초대규모 분산 컴퓨팅 기능을 제공합니다.
• 사용자 친화적 인터페이스: PAI의 데이터 시각화 기능을 통해 개발자는 드래그 앤 드롭 기능으로 구성요소를 작업 흐름에 편리하고 신속하게 투입할 수 있습니다. 모델 구축 및 디버깅 효율성을 향상시키는데 도움을 드립니다.
4. 4
Machine Learning Framework (Alink / MPI / PS / Graph / TensorFlow / PyTorch / Caffe…)
Compute Engine (MaxCompute / EMR / Realtime Compute)
PAI-EAS online prediction
• One-click
deployment
• High
performance
• Blue-green
deployment
• Blade compilation
optimization
Basic hardware (CPU, GPU, FPGA, NPU)
Alibaba Cloud Container Service for K8S (ACK)
Visual modeling
PAI-Studio
• Nearly 200 algorithm
components
• Drag-drop method to
build an experiment
• Supports many
feature samples
PAI-DSW
Interactive modeling
• Deep integration of
big data engines
• Multi-frame TF,
PyTorch
• JupyterLab, WebIDE,
Terminal
Cloud-native
deep training
PAI-DLC
• Cloud-native and
container
• Elasticity
• Out of the box
Automatic learning
PAI AutoLearning
• Zero threshold use
• Migration learning
framework
• One-stop solution
Intelligent ecological
market
• AI solution
• Algorithms &
Models
• Business
application API
AI Taobao platform
Data collection
Intelligent labeling
• Multi-scene
Template: Image
Detection,
segmentation, and
comprehensive
annotation
• Data set
management
• Active learning *
• Smart pre-labeling
• Elastic scaling
Machine Learning Platform for AI (PAI)
PAI Studio, , ML All-in-One .
5. 5
PAI-Studio
GUI and Distributed Modeling Platform
Data preprocessing, feature engineering,
model training, modelEvaluation
prediction 200
ML
7
90%
PAI Studio PySpark,
Spark
PAI-Studio GUI ML .
7. 7
PAI console
Create
Deploy
Update
Destroy
Arena CLT
Control
Panel
Manage
DLC Cluster
Manage
Jobs
ECS Clusters
VPC
ECS Instance
VPC
ECS Instance
VPC
ECS Instance
. . . . . .
DLC DLC DLC
DLC Clusters
TF Jobs PT Jobs Jupyter
Dash
Board
EIP
API Server
Scheduler
Cloud-native Linear acceleration
Fully cloud-based infrastructure
Kubernetes-native, containerization / Support ACK/ECI
Semi-Hosting & full hosting
Elastic resources, dynamic scaling
Data parallelism & Model parallelism
10 Classification Task
High cost performance
(GN6V, GN5, )
PAI-DLC
Deep Learning Containers cloud-native deep learning training platform
PAI-DLC Deep Learning .
8. 8
PAI-TF
Network layer
AutoML-Tuning automatic parameter adjustment
TransferLearning framework
Business layer Image , , visual
ResNet LeNet VGG GoogLeNet InceptionNet... ...
0 threshold use
Small amount of data
1 station solution
, ,
Open the box, small white friendly
TF .
PAI-AutoLearning
the underlying framework based on a PAI-TF-developed migration learning framework
10. 10
Cloud-native online services
• , 40W+ QPS
• (traditional learning and deep learning)
• , scaling, blue-green
• Processor SDK
• PAI-Studio, PAI-DSW customer
• PAI-Blade, model compilation
PAI-EAS
Online prediction Elastic Algorithm Service
.
11. 11
PAI-EAS
Online prediction Elastic Algorithm Service
Cloud
Native
ML/DL Model Support
TensorFlow、Caffe、PMML
、 OfflineModel….
CalculationDifferent alarm
timepoints Every task needs an
alarm timepoint
RESTful API .
16. 16
PAI
• , VOC , , ,
Workflow :
• PAI Studio
• PAI EAS
Product
PAI + MaxCompute + DataWorks
NAS
OSS,…
MaxCompute
Distributed
Training
OCR
Machine
Translation
NLP
Content
Security
DataWorks
DataHub
Image
Identification
Risk
Control
Brain
Public
Opinion
Marketing
Cloud Resources
PAI
Studio
PAI Notebook Service
( DSW )
Developer
PAI EAS
( Elastic
Algorithm
Service )
AI Security
PAI .
20. 20
Underlying basic data
User data
Data processing and
storage
(Offline) User/Material
Feature engineering
Data integration hourly cycle import
Training (offline)
Material data
Third-party portrait
RDS: MySQL
Nginx
User
Behavior
Log
Kafka
Flume
DRDS
Comment
data
MaxCompute DW
User table
Material
table
MaxCompute
User
characteristics
Material
characteristics
Behavior
characteristics
DW
Flink real-time
computing
ETL
Statistics
business
Real-time
features
Kafka
PAI-Studio
Sample generation
Recall algorithm
Sample generation
PAI-Studio
Sorting algorithm
RDS: MySQL
User/material
recommendation
list
Online recommendation storage
Redis
User/Material
Features
User vector
PAI-EAS
Model services
Inference service
Faiss
Server
Vector service
OSS transit
Item vector
Model file
Online shop service
(Online)
Sub-Table 1: Reading
history
POLARDB
Sub-Table 2: Reading
history
K8S
User exposure request
Recommended
module
Multi-Channel
recall
Exposure
deduplication
filtering
Sorting
Query K most similar items
Real-TIME
2 -
PAI
.
RDS, POLAR DB
,
.
PAI
.