Stream Upload And
Asynchronous Job
Processing System
Lê Bá Minh – minhlb@vng.com.vn
Technical Manager – Zalo Team - VNG
Agenda
• 1/ Why we need an Asynchronous Job Processing
System?
• 2/ How it works ?
• 3/ Application
• 4/ Q &A
Parallel Stream Upload
• Data is separated in chunks
Facts
• Zalo Stream Upload
• Background continuous Voice Upload
• Background Image upload
• …
• Facts (now)
• 1M voices /day
• 800K images /day
• Peak: 500 Chunks/second
• Expect:
• Scalable (more than 5000 chunks/second)
• High performance
What we need
• Asynchronous Job processing System
Collect Data
Processing Data
Response
Collect Data
Processing DataResponse
Workers
What we need
• Asynchronous Job processing System
• Batch Job
• Big data job
• High Reliable: No job missed
• Distributed job processing workers
• High performance
• Persistent
• Load balancing, Failed over, Recoverable
Open-source solutions
• Share-memory workers
• All workers in one physical server
• No fail-over
• Un-scalable
• Gearman
• Good but not completely fit our requirement
• No Batch Job support
• Not full reliable (lost job)
• Not full load-balance
• Un-stable if more than 2000 jobs/second
Zalo Asyn Job Processing
System
Client
Client
Worker 1
Worker 2
Worker 3
Z Database
Short Connection
Long Connection
TCP
TCP
Worker
Manager
Job
Caching
Job
Manager
Persistent
Manager
Job
Clean-Up
Job Server
TCP
TCP
TCP
Implementation
• C/C++ for Job Server
• C/C++, Java for client and workers
• Binary Protocol
• Z-Database
Job State
Queuing
Processing
Failed Time Out
Finished
Deliver to Worker
Worker ACK Failed
Worker ACK Finished
No ACK
Started
Job Type
• Single Job
• Simple task
• Immediately deliver
• Batch Job
• Multiple tasks
• Deliver when received all tasks
Deployment
Job Server 1
Job Server 2
Synchronized
Business Server
Worker 1
Worker 2
Worker 3
Applications
• Using for all Asynchronous job processing in Zalo: voice
upload, image upload, feed processing…
• Benchmark (single server)
• 50K images/seconds (640x480)
• 50k voices/seconds (30s)
• Advantages
• Batch Jobs
• Never lost job
• Worker can restart or stop any time
• Fail-over, Load Balancing, Quick recover in failure
• Issue
• Job duplication (handled by worker)
Q&A
Stream upload and asynchronous job processing  in large scale systems

Stream upload and asynchronous job processing in large scale systems

  • 1.
    Stream Upload And AsynchronousJob Processing System Lê Bá Minh – minhlb@vng.com.vn Technical Manager – Zalo Team - VNG
  • 2.
    Agenda • 1/ Whywe need an Asynchronous Job Processing System? • 2/ How it works ? • 3/ Application • 4/ Q &A
  • 3.
    Parallel Stream Upload •Data is separated in chunks
  • 4.
    Facts • Zalo StreamUpload • Background continuous Voice Upload • Background Image upload • … • Facts (now) • 1M voices /day • 800K images /day • Peak: 500 Chunks/second • Expect: • Scalable (more than 5000 chunks/second) • High performance
  • 5.
    What we need •Asynchronous Job processing System Collect Data Processing Data Response Collect Data Processing DataResponse Workers
  • 6.
    What we need •Asynchronous Job processing System • Batch Job • Big data job • High Reliable: No job missed • Distributed job processing workers • High performance • Persistent • Load balancing, Failed over, Recoverable
  • 7.
    Open-source solutions • Share-memoryworkers • All workers in one physical server • No fail-over • Un-scalable • Gearman • Good but not completely fit our requirement • No Batch Job support • Not full reliable (lost job) • Not full load-balance • Un-stable if more than 2000 jobs/second
  • 8.
    Zalo Asyn JobProcessing System Client Client Worker 1 Worker 2 Worker 3 Z Database Short Connection Long Connection TCP TCP Worker Manager Job Caching Job Manager Persistent Manager Job Clean-Up Job Server TCP TCP TCP
  • 9.
    Implementation • C/C++ forJob Server • C/C++, Java for client and workers • Binary Protocol • Z-Database
  • 10.
    Job State Queuing Processing Failed TimeOut Finished Deliver to Worker Worker ACK Failed Worker ACK Finished No ACK Started
  • 11.
    Job Type • SingleJob • Simple task • Immediately deliver • Batch Job • Multiple tasks • Deliver when received all tasks
  • 12.
    Deployment Job Server 1 JobServer 2 Synchronized Business Server Worker 1 Worker 2 Worker 3
  • 13.
    Applications • Using forall Asynchronous job processing in Zalo: voice upload, image upload, feed processing… • Benchmark (single server) • 50K images/seconds (640x480) • 50k voices/seconds (30s) • Advantages • Batch Jobs • Never lost job • Worker can restart or stop any time • Fail-over, Load Balancing, Quick recover in failure • Issue • Job duplication (handled by worker)
  • 14.