Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & PostgreSQL

244 views

Published on

Khi những background job queues hiện có trên thị trường đã rất phổ biến và chất lượng cao (Sidekiq, Celery, Faktory, vv), việc thiết kế lại (reinvent the wheel) thường không được khuyến khích.

Tuy vậy, team Engineering của Holistics đã tự thiết kế lại một hệ thống background job queue riêng trên nền tảng Ruby + PostgreSQL để phục vụ đặc thù riêng của hệ thống B2B của công ty.

Cùng đến với bài talk này để nghe anh Huy Nguyen, CTO của Holistics, chia sẻ về cách team Holistics đã thiết kế hệ thống background job queue này như thế nào, tại sao lại viết lại, tại sao dùng Ruby + PostgreSQL.

Speaker: Huy Nguyen
- Cofounder & CTO of Holistics Software
- Cofounder of Grokking

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & PostgreSQL

  1. 1. Huy Nguyen CTO, Co-founder - Holistics.io Building A Job Queue System with PostgreSQL & Ruby Grokking TechTalk #24 - Background Job Queues Ho Chi Minh City - March 2018
  2. 2. About Me Education: ● Pho Thong Nang Khieu, Tin 04-07 ● National University of Singapore (NUS), Computer Science Major. Work: ● Software Engineer Intern, SenseGraphics (Stockholm, Sweden) ● Software Engineer Intern, Facebook (California, US) ● Data Infrastructure Engineer, Viki (Singapore) Now: ● Co-founder & CTO, Holistics Software ● Co-founder, Grokking Vietnam huy@holistics.io facebook.com/huy bit.ly/huy-linkedin
  3. 3. ● Building Analytics Infrastructure at Viki ● Why PostgreSQL for Analytics Infrastructure ● PostgreSQL Internals: Discussing on Uber Moving From PostgreSQL to MySQL ● Now: Building A Job Queue System with PostgreSQL & Ruby Some Talks I’ve Given huy@holistics.io facebook.com/huy bit.ly/huy-linkedin
  4. 4. B2B SaaS Web Application. Connect to customer’s DB → run queries → wait for results → display to end users Background
  5. 5. Requirements Customer A Customer B Customer C Reliability: Job should be processed only once and never missed; job pickup order; retry mechanism. Jobs Persistence: store jobs info, track job’s statistics (run duration, start time, end time, etc) Multi-tenancy: Each customer has own queue slots
  6. 6. Queue Architecture Customer A Customer B Customer C Customer’s Queue Slot Worker Worker Worker Worker Worker Worker Worker Worker ... Sidekiq Holistics Job Queue Layer ● Ruby + Rails ● PostgreSQL for DB ● Sidekiq Background Job
  7. 7. class DataReport < ApplicationRecord include Queuable def execute # compose and run this report return values end end report = DataReport.find(123) # normal: execute synchronously, this returns the return value of `execute` method report_results = report.execute # execute asynchronously, this returns a job ID (int) job_id = report.async.execute CREATE TABLE jobs ( id INTEGER PRIMARY KEY, source_type VARCHAR, source_method VARCHAR, source_id INTEGER, args JSONB DEFAULT '{}', status VARCHAR, start_time TIMESTAMP, queued_time TIMESTAMP, end_time TIMESTAMP, created_at TIMESTAMP, stats JSONB DEFAULT '{}', tenant_id INTEGER ) Job Status: created → queued → running → success / failure Storing Jobs Data
  8. 8. -- finds out how many jobs are running per queue, so we know if it's full WITH running_jobs_per_queue AS ( SELECT tenant_id, count(1) AS running_jobs from jobs WHERE (status = 'running' OR status = 'queued') -- running or queued AND created_at > NOW() - INTERVAL '6 HOURS' -- ignore jobs past 6 hours group by 1 ), -- find out queues that are full full_queues AS ( select R.tenant_id from running_jobs_per_queue R left join tenant_queues Q ON R.tenant_id = Q.tenant_id where R.running_jobs >= Q.num_slots ) select id from jobs where status = 'created' and tenant_id NOT IN ( select tenant_id from full_queues ) order by id asc for update skip locked limit 1 SQL to claim next job Select the next job which customer still have available queue slots. Skip over rows that’s been selected (SKIP LOCKED) Acquire a row-level lock upon selecting. CREATE TABLE tenant_queues ( id INTEGER PRIMARY KEY, tenant_id INTEGER, num_slots INTEGER )
  9. 9. Each job is processed in a transaction. Upon finding next job, change its state and send over to Sidekiq Queuing next job class Job def self.queue_next_job() ActiveRecord::Base.transaction do ret = ActiveRecord::Base.connection.execute queue_sql return nil if ret.values.empty? job_id = ret.values.first.first.to_i job = Job.find(job_id) # send to background worker job.status = 'queued' && job.save JobWorker.perform_async(job_id) end end end
  10. 10. Generic Sidekiq job worker # simplified code class JobWorker include Sidekiq::Worker def perform(job_id) job = Job.find(job_id) job.status = 'running' && job.save obj = job.source_type.constantize.find(job.source_id) obj.call(job.source_method, job.args) job.status = 'success' && job.save rescue job.status = 'error' && job.save ensure Job.queue_next_job() end end This is run inside Sidekiq worker (background). Pull relevant instance from database and construct object. Invoke the method with relevant parameters
  11. 11. Supervisor vs non-supervisor OTHER JOB QUEUE HOLISTICS JOB QUEUE Master Dedicated process to receive request SQL + inline with existing Rails or Sidekiq process Workers Dedicated processes or threads Pass over to Sidekiq
  12. 12. Easily switch between synchronous vs asynchronous. No need to create dedicated workers code. Thanks to Ruby’s metaprogramming. The .async keyword class DataReport < ApplicationRecord include Queuable def execute # compose and run this report return values end end report = DataReport.find(123) # normal: execute synchronously, this returns the return value of `execute` method report_results = report.execute # execute asynchronously, this returns a job ID (int) job_id = report.async.execute
  13. 13. Summary ● Why Reinvent The Wheel? Or did we. ● What we have now. Customer A Customer B Customer C Worker Worker Worker Worker Worker Worker Worker Worker ... Sidekiq Holistics Job Queue Layer
  14. 14. ● que, an open-source job queue written for Ruby & PostgreSQL. ● Using advisory locks. ● Learn more: https://github.com/chanks/que Other Job Queue Using PostgreSQL: que
  15. 15. Q&As?
  16. 16. Jobs @ Holistics Looking for comrades to join our small team. Why? ● Global, enterprise product used by well-known companies (Grab, Traveloka, ...) ● Small, lean team that moves fast Positions: ● Backend Engineer ● Front-end Engineer ● Technical Sales Engineer ● Product Manager holistics.io/careers

×