Building Efficient and Reliable Crawler System With Sidekiq Enterprise

G
Photo: http://cliparts.co/clipart/3666251
Has anyone ever written crawlers?
Has anyone ever used cron?
Has anyone ever used Sidekiq?
Gary (Chien-Wei Chu)

@icarus4 / @icarus4.chu
Was a C programmer

Fall in love with Ruby since 2013
CTO of Statementdog
Building Efficient and Reliable Crawler System With Sidekiq Enterprise
I Play
Photo: https://static01.nyt.com/images/2016/08/19/sports/19BADMINTONweb3/19BADMINTONweb3-master675.jpg
Building Efficient and Reliable Crawler System With Sidekiq Enterprise
Photo: http://classic.battle.net/images/battle/scc/protoss/pix/units/screenshots/d05.jpg
Photo: http://resources.workable.com/wp-content/uploads/2015/08/ruby-560x224.jpg
• Introduction to Statementdog
• Introduction to Statementdog
• Data behind Statementdog
• Introduction to Statementdog
• Data behind Statementdog
• Past practice of Statementdog
• Introduction to Statementdog
• Data behind Statementdog
• Past practice of Statementdog
• Problems of the past practice
• Introduction to Statementdog
• Data behind Statementdog
• Past practice of Statementdog
• Problems of the past practice
• How we design our system to solve the problems.
Focus on:
• More reliable job scheduling
• Dealing with throttling issue
Building Efficient and Reliable Crawler System With Sidekiq Enterprise
Building Efficient and Reliable Crawler System With Sidekiq Enterprise
Building Efficient and Reliable Crawler System With Sidekiq Enterprise
Building Efficient and Reliable Crawler System With Sidekiq Enterprise
Building Efficient and Reliable Crawler System With Sidekiq Enterprise
Building Efficient and Reliable Crawler System With Sidekiq Enterprise
Building Efficient and Reliable Crawler System With Sidekiq Enterprise
Building Efficient and Reliable Crawler System With Sidekiq Enterprise
Building Efficient and Reliable Crawler System With Sidekiq Enterprise
(Revenue)
(Revenue)
(EPS)
(Revenue)
(EPS)
(Gross Margin)
(Net Income)
(Revenue)
(EPS)
(Gross Margin)
(Net Income)
(Assets)
(Liabilities)
(Revenue)
(EPS)
(Gross Margin)
(Net Income)
(Assets)
(Liabilities)
(Operating Cash Flow)
(Free Cash Flow)
(Investing Cash Flow)
(Revenue)
(EPS)
(Gross Margin)
(Net Income)
(Assets)
(Liabilities)
(Operating Cash Flow)
(Free Cash Flow)
(Investing Cash Flow)
(ROE)
(ROA)
(Accounts Receivable)
(Accounts Payable)
(Revenue)
(EPS)
(Gross Margin)
(Net Income)
(Assets)
(Liabilities)
(Operating Cash Flow)
(Free Cash Flow)
(Investing Cash Flow)
(ROE)
(ROA)
(Accounts Receivable)
(Accounts Payable)
(Revenue)
(EPS)
(Gross Margin)
(Net Income)
(Assets)
(Liabilities)
(Operating Cash Flow)
(Free Cash Flow)
(Investing Cash Flow)
(ROE)
(ROA)
(Accounts Receivable)
(Accounts Payable)
(PMI)
(Revenue)
(EPS)
(Gross Margin)
(Net Income)
(Assets)
(Liabilities)
(Operating Cash Flow)
(Free Cash Flow)
(Investing Cash Flow)
(ROE)
(ROA)
(Accounts Receivable)
(Accounts Payable)
(PMI)
GDP


Taiwan Market Observation Post System ( )
Taiwan Stock Exchange ( )
Taiwan Depository & Clearing Corporation ( )
Yahoo Stock Feed
…
…
Yearly - dividend, remuneration of directors and supervisors
Quarterly - quarterly financial statements
Monthly - Revenue
Weekly -
Daily - closing price
Hourly - stock news from Yahoo stock feed
Minutely - important news from Taiwan Market Observation Post System
Building Efficient and Reliable Crawler System With Sidekiq Enterprise
Something like this,
but written in PHP
A super long running process (1 hour+)
loops from the first stock to the last one
Stock.find_each do |stock| 

# download xml financial report data 

… 

# extract xml data 

… 

# calculate advanced data 

…



end 

A super long running process
for quarterly report
A super long running process
for quarterly report
A super long running process
for monthly revenue
A super long running process
for quarterly report
A super long running process
for monthly revenue
A super long running process
for daily price
A super long running process
for quarterly report
A super long running process
for monthly revenue
A super long running process
for daily price
A super long running process
for news
.
.
.
• Really slow
• Really slow
• Inefficient - unable to only retry the failed one
• Really slow
• Inefficient - unable to only retry the failed one
• Unpredictable server loading
Job 1 Job 2 Job 3
Time
When the server loading is low
Job 4 Job 5
Server

loading
When the server loading is HIGH
Time
Server

loading
Other task
Job 1
Job 2
Job 3
When the server loading is HIGH
Job 4
Job 5
Time
Server

loading
Other task
Job 1
Job 2
Job 3
When the server loading is HIGH
Job 4
Job 5
Time
Server

loading
Other task
Too many crawler processes executed at the same time
• Really slow
• Inefficient - unable to only retry the failed one.
• Unpredictable server loading
• Scale out is not easy
• Inherent problems of Unix Cron:
• Inherent problems of Unix Cron:
• Unreliable scheduling
• Inherent problems of Unix Cron:
• Unreliable scheduling
• High availability is not easy
• Inherent problems of Unix Cron:
• Unreliable scheduling
• High availability is not easy
• Hard to prioritize job by the popularity
• Inherent problems of Unix Cron:
• Unreliable scheduling
• High availability is not easy
• Hard to prioritize job by the popularity
• Not easy to deal with bandwidth throttling issue
Building Efficient and Reliable Crawler System With Sidekiq Enterprise


Created by Mike Perham
Building Efficient and Reliable Crawler System With Sidekiq Enterprise
Web server
Request
Request
Request
.
.
.
Process
Request
Request
Request
.
.
.
Job queue
push to queue

(very fast)
Web server
Process
Request
Request
Request
.
.
.
Job queue
push to queue

(very fast)
Worker process
Worker process
.
.
.
Worker server
Worker process
Web server
Process
Request
Request
Request
.
.
.
Job queue
push to queue

(very fast)
Worker process
Worker process
.
.
.
Worker server
Worker process
Web server
Process Add extra servers
when needed
Request
Request
Request
.
.
.
Job queue
push to queue

(very fast)
Producer
Worker process
Worker process
.
.
.
Worker server
Worker process
Web server
Process
Request
Request
Request
.
.
.
Job queue
push to queue

(very fast)
Producer
Consumer
Worker process
Worker process
.
.
.
Worker server
Worker process
Web server
Process
Worker process
thread 1
thread 2
thread 3
thread 25
.
.
.
Worker process v.s.
Multi-threadSingle process
Worker process
thread 1
thread 2
thread 3
thread 25
.
.
.
Worker process
1 : 25
Multi-threadSingle process
Multi-thread
Worker process
thread 1
thread 2
thread 3
thread 25
.
.
.
Single process
Worker process
1 : 25
With the same degree of memory consumption
Sidekiq (OSS)
Sidekiq Pro
Sidekiq Enterprise
Sidekiq Pro Sidekiq Enterprise
Batches
Enhanced Reliability
Search in Web UI
Worker Metrics
Expiring Jobs
Rate Limiting
Periodic Jobs
Unique Jobs
Historical Metrics
Multi-process
Encryption
Building Efficient and Reliable Crawler System With Sidekiq Enterprise
Parallelism Make Things Faster
• Really slow
• Inefficient - unable to only retry the failed one.
• Unpredictable server loading
• Scale out is not easy
• Efficient - only retry the failed one
• Predictable server loading
• Easy to scale out
• Really slow
• Inefficient - unable to only retry the failed one.
• Unpredictable server loading
• Scale out is not easy
• Inherent problem of Unix Cron:
• Unreliable scheduling
• High availability is not easy
• Hard to prioritize job by the popularity
• Not easy to deal with bandwidth throttling issue
Building Efficient and Reliable Crawler System With Sidekiq Enterprise
Building Efficient and Reliable Crawler System With Sidekiq Enterprise
–Mike Perham, CEO, Contributed Systems,
Creator of Sidekiq
Keep states of cron executions in 

our robustest part of system - database
All scheduled jobs are invoked by a particular job
executed minutely
Keep states of cron executions in 

our robustest part of system - database
All scheduled jobs are invoked by a particular job
executed minutely
create_table :cron_jobs do |t|


t.string :klass, null: false


t.string :cron_expression, null: false


t.timestamp :next_run_at, null: false, index: true


end 



Create table for storing cron settings
table name: cron_jobs
create_table :cron_jobs do |t|


t.string :klass, null: false


t.string :cron_expression, null: false


t.timestamp :next_run_at, null: false, index: true


end 



Create table for storing cron settings
worker class name
create_table :cron_jobs do |t|


t.string :klass, null: false


t.string :cron_expression, null: false


t.timestamp :next_run_at, null: false, index: true


end 



Create table for storing cron settings
Something like
0 */2 * * *
create_table :cron_jobs do |t|


t.string :klass, null: false


t.string :cron_expression, null: false


t.timestamp :next_run_at, null: false, index: true


end 



Create table for storing cron settings
when will a job should be executed
klass cron_expression next_run_at
Push2000NewsJobs “0 */2 * * *” …
Push2000DailyPriceJobs “0 2 * * 1-5” …
Push2000MonthlyRevenueJobs “0 0 10 * *” …
…
# Add to your Cron setting


every :minute do 

runner 'CronJobWorker.perform_async' 

end
Cron only schedules one job minutely
class CronJobWorker 

include Sidekiq::Worker 



def perform 

CronJob.find_each("next_run_at <= ?", Time.now) do |job| 





end 

end 

end 



CronJobWorker to invoke all of your crawlers
Find jobs should be executed
class CronJobWorker 

include Sidekiq::Worker 



def perform 

CronJob.find_each("next_run_at <= ?", Time.now) do |job| 



Sidekiq::Client.push(
class: job.klass.constantize,
args: ['foo', ‘bar']
)



end 

end 

end 



CronJobWorker to invoke all of your crawlers
Push jobs to job queue
class CronJobWorker 

include Sidekiq::Worker 



def perform 

CronJob.find_each("next_run_at <= ?", Time.now) do |job| 



Sidekiq::Client.push(
class: job.klass.constantize,
args: ['foo', ‘bar']
)



x = Sidekiq::CronParser.new(job.cron_expression) 

job.update!(next_run_at: x.next.to_time) 

end 

end 

end 



CronJobWorker to invoke all of your crawlers
Setup the next execution time
class CronJobWorker 

include Sidekiq::Worker 



def perform 

CronJob.find_each("next_run_at <= ?", Time.now) do |job| 



Sidekiq::Client.push(
class: job.klass.constantize,
args: ['foo', ‘bar']
)



x = Sidekiq::CronParser.new(job.cron_expression) 

job.update!(next_run_at: x.next.to_time) 

end 

end 

end 



CronJobWorker to invoke all of your crawlers
The missed job executions will be
executed at next minute
• Inherent problem of Unix Cron:
• Unreliable scheduling
• Hard to prioritize job by the popularity
• High availability is not easy
• Not easy to deal with bandwidth throttling issue
Drawbacks solved
• Inherent problem of Unix Cron:
• Unreliable scheduling
• Hard to prioritize job by the popularity
• High availability is not easy
• Not easy to deal with bandwidth throttling issue


table: cron_jobs
klass cron_expression args next_run_at
Push2000NewsJobs “0 */2 * * *” [] …


table: cron_jobs
klass cron_expression args next_run_at
Push2000NewsJobs “0 */2 * * *” [] …
NewsWorker “*/30 * * * *” [popular_stock_id_1] …
NewsWorker “*/30 * * * *” [popular_stock_id_2] …
…
Drawbacks solved
• Inherent problem of Unix Cron:
• Unreliable scheduling
• Hard to prioritize job by the popularity
• High availability is not easy
• Not easy to deal with bandwidth throttling issue
• Inherent problem of Unix Cron:
• Unreliable scheduling
• Hard to prioritize job by the popularity
• High availability is not easy
• Not easy to deal with bandwidth throttling issue
Sidekiq.configure_server do |config| 

config.periodic do |mgr| 

mgr.register("* * * * * *", CronJobWorker) 

end 

end 



• Inherent problem of Unix Cron:
• Unreliable scheduling
• Hard to prioritize job by the popularity
• High availability is not easy
• Not easy to deal with bandwidth throttling issue
• Inherent problem of Unix Cron:
• Unreliable scheduling
• Hard to prioritize job by the popularity
• High availability is not easy
• Not easy to deal with bandwidth throttling issue
Building Efficient and Reliable Crawler System With Sidekiq Enterprise
You always want your crawler 

as fast as possible
However, your target server doesn’t
always allow you to crawl with
unlimited rate
Insert 2000 jobs to the queue at the same time
Stock.pluck(:id).each do |stock_id| 

SomeWorker.perform_async(stock_id) 

end 

If you want to craw data for your 2000 stocks
Assume a target server accepts request at
maximum rate equals to
1 request / second
Time
(second)
1 2 3
job1
job2
job3
.
.
.
job2000
Insert 2000 jobs to the queue at the same time
All of your jobs may be blocked (except the first one)
Improvement 1
Schedule jobs with incremental delays
Stock.pluck(:id).each_with_index do |stock_id, index| 

SomeWorker.perform_in(index, stock_id) 

end 

Time
(second)
1 2 3
job1 job2 job3
…
job2000
2000
Workable, but…
1
job1 job2 job3
…
job2000
If the target server is unreachable
Time
(second)
Workable, but…
1 2 3
job1 job2 job3
…
job2000
2000
If the target server is unreachable
job3~2000 will still execute at the same time
Time
(second)
• Limit your worker thread to perform specific job
with bounded rate
• Sidekiq Enterprise provides two types of rate
limiting API
CONCURRENT_LIMITER = Sidekiq::Limiter.concurrent('price', 10) 



def perform(...) 

CONCURRENT_LIMITER.within_limit do 

# crawl stock data


end 

end 

CONCURRENT_LIMITER = Sidekiq::Limiter.concurrent('price', 10) 



def perform(...) 

CONCURRENT_LIMITER.within_limit do 

# crawl stock data


end 

end 

Only 10 concurrent operations inside the block
can happen at any given moment
BUCKET_LIMITER = Sidekiq::Limiter.bucket('price', 10, :second) 



def perform(...) 

BUCKET_LIMITER.within_limit do 

# crawl stock data


end 

end 

For every second, you can perform up to 10 operations
You must fine tune parameters of your limiter
for each data source for better performance
By far, you already got better performance.
However, the throttling control of your target server 

may not always be static.
Many websites are dynamically throttling controlled.


If throttling detected, pause your workers for a while
Redis (job queue)
Redis (job queue)
default
critical
low
Redis (job queue)
default
critical
low
Worker thread
Worker thread
Worker thread
Worker thread
Worker thread
Redis (job queue)
default
critical
low
Worker thread
Worker thread
Worker thread
Worker thread
Worker thread
Redis (job queue)
default
critical
low
Worker thread
Worker thread
Worker thread
Worker thread
Worker thread
yahoo
Redis (job queue)
default
critical
low
Worker thread
Worker thread
Worker thread
Worker thread
Worker thread
yahoo

(paused)
Pause this queue when throttled
Redis (job queue)
default
critical
low
Worker thread
Worker thread
Worker thread
Worker thread
Worker thread
Schedule a job executed after few seconds 

to “unpause" job in another queue
yahoo

(paused)
Redis (job queue)
default
critical
low
Worker thread
Worker thread
Worker thread
Worker thread
Worker thread
yahoo
(resumed)
Resumed after the unpause queue job executed
class SomeWorker 

include Sidekiq::Worker 



def perform 

# try to crawl something 

# ... 



if throttled 

queue_name = self.class.get_sidekiq_options['queue'] 

queue = Sidekiq::Queue.new(queue_name) 

queue.pause! 

ResumeJobQueueWorker.perform_in(30.seconds, queue_name) 

end 

end 

end 







class SomeWorker 

include Sidekiq::Worker 



def perform 

# try to crawl something 

# ... 



if throttled 

queue_name = self.class.get_sidekiq_options['queue'] 

queue = Sidekiq::Queue.new(queue_name) 

queue.pause! 

ResumeJobQueueWorker.perform_in(30.seconds, queue_name) 

end 

end 

end 







class SomeWorker 

include Sidekiq::Worker 



def perform 

# try to crawl something 

# ... 



if throttled 

queue_name = self.class.get_sidekiq_options['queue'] 

queue = Sidekiq::Queue.new(queue_name) 

queue.pause! 

ResumeJobQueueWorker.perform_in(30.seconds, queue_name) 

end 

end 

end 



class ResumeJobQueueWorker 

include Sidekiq::Worker 

sidekiq_options queue: :queue_control, unique: :until_executed 



def perform(queue_name) 

queue = Sidekiq::Queue.new(queue_name) 

queue.unpause! if queue.paused? 

end 

end 



The queue for ResumeJobQueueWorker

MUST NOT equal to the paused queue
We have a dedicated queue for
ResumeJobQueueWorker
Decrease Sidekiq server poll interval for more
precise timing control
Queue pausing alleviates throttling issues
Is it possible for us to do things even better?
Most throttling control aim to block requests
from the same IP address
We can change our IP address via
proxy service
Sidekiq
server
Target
server
a.b.c.d
Sidekiq
server
Target
server
a.b.c.d
a.b.c.d
Sidekiq
server
Target
server
a.b.c.d
a.b.c.d
a.b.c.d
a.b.c.d
Same IP for each request
Sidekiq
server
Target
server
a.b.c.d
Proxy
service
end
point
Sidekiq
server
Target
server
a.b.c.d
Proxy
service
end
point
proxy server
e.f.g.h
Sidekiq
server
Target
server
a.b.c.d
a.b.c.d
Proxy
service
end
point
proxy server
proxy server
e.f.g.h
i.j.k.l
Sidekiq
server
Target
server
a.b.c.d
a.b.c.d
a.b.c.d
a.b.c.d
Proxy
service
end
point
proxy server
proxy server
proxy server
proxy server
e.f.g.h
i.j.k.l
m.n.o.p
q.r.s.t
Sidekiq
server
Target
server
a.b.c.d
a.b.c.d
a.b.c.d
a.b.c.d
Proxy
service
end
point
proxy server
proxy server
proxy server
proxy server
e.f.g.h
i.j.k.l
m.n.o.p
q.r.s.t
Different IP for each request
• Inherent problem of Unix Cron:
• Unreliable scheduling
• Hard to prioritize job by the popularity
• High availability is not easy
• Not easy to deal with bandwidth throttling issue
Building Efficient and Reliable Crawler System With Sidekiq Enterprise
• With Sidekiq (Enterprise) and a proper design, the following problems
are solved
• Slow crawler
• Inefficient - unable to only retry the failed one
• Unpredictable server loading
• Scale out is not easy
• Inherent problem of Unix Cron
• Not easy to deal with bandwidth throttling issue
Building Efficient and Reliable Crawler System With Sidekiq Enterprise
Building Efficient and Reliable Crawler System With Sidekiq Enterprise
1 of 147

Recommended

Scala in-practice-3-years by Patric Fornasier, Springr, presented at Pune Sca... by
Scala in-practice-3-years by Patric Fornasier, Springr, presented at Pune Sca...Scala in-practice-3-years by Patric Fornasier, Springr, presented at Pune Sca...
Scala in-practice-3-years by Patric Fornasier, Springr, presented at Pune Sca...Thoughtworks
5.1K views57 slides
A web app in pure Clojure by
A web app in pure ClojureA web app in pure Clojure
A web app in pure ClojureDane Schneider
4.6K views69 slides
BTV PHP - Building Fast Websites by
BTV PHP - Building Fast WebsitesBTV PHP - Building Fast Websites
BTV PHP - Building Fast WebsitesJonathan Klein
1.6K views52 slides
Hadoop Demystified + Automation Smackdown! Austin JUG June 24 2014 by
Hadoop Demystified + Automation Smackdown!  Austin JUG June 24 2014Hadoop Demystified + Automation Smackdown!  Austin JUG June 24 2014
Hadoop Demystified + Automation Smackdown! Austin JUG June 24 2014datafundamentals
942 views43 slides
Promise of a better future by Rahul Goma Phulore and Pooja Akshantal, Thought... by
Promise of a better future by Rahul Goma Phulore and Pooja Akshantal, Thought...Promise of a better future by Rahul Goma Phulore and Pooja Akshantal, Thought...
Promise of a better future by Rahul Goma Phulore and Pooja Akshantal, Thought...Thoughtworks
5.7K views20 slides
Developer-friendly taskqueues: What you should ask yourself before choosing one by
Developer-friendly taskqueues: What you should ask yourself before choosing oneDeveloper-friendly taskqueues: What you should ask yourself before choosing one
Developer-friendly taskqueues: What you should ask yourself before choosing oneSylvain Zimmer
684 views30 slides

More Related Content

What's hot

Modern websites in 2020 and Joomla by
Modern websites in 2020 and JoomlaModern websites in 2020 and Joomla
Modern websites in 2020 and JoomlaGeorge Wilson
589 views45 slides
Lotuscript for large systems by
Lotuscript for large systemsLotuscript for large systems
Lotuscript for large systemsBill Buchan
818 views56 slides
The Many Ways to Test Your React App by
The Many Ways to Test Your React AppThe Many Ways to Test Your React App
The Many Ways to Test Your React AppAll Things Open
2.1K views96 slides
Enterprise Integration Patterns with Apache Camel by
Enterprise Integration Patterns with Apache CamelEnterprise Integration Patterns with Apache Camel
Enterprise Integration Patterns with Apache CamelIoan Eugen Stan
9.2K views52 slides
Developing Microservices with Apache Camel by
Developing Microservices with Apache CamelDeveloping Microservices with Apache Camel
Developing Microservices with Apache CamelClaus Ibsen
6.8K views101 slides
Apache Camel K - Copenhagen by
Apache Camel K - CopenhagenApache Camel K - Copenhagen
Apache Camel K - CopenhagenClaus Ibsen
681 views85 slides

What's hot(20)

Modern websites in 2020 and Joomla by George Wilson
Modern websites in 2020 and JoomlaModern websites in 2020 and Joomla
Modern websites in 2020 and Joomla
George Wilson589 views
Lotuscript for large systems by Bill Buchan
Lotuscript for large systemsLotuscript for large systems
Lotuscript for large systems
Bill Buchan818 views
The Many Ways to Test Your React App by All Things Open
The Many Ways to Test Your React AppThe Many Ways to Test Your React App
The Many Ways to Test Your React App
All Things Open2.1K views
Enterprise Integration Patterns with Apache Camel by Ioan Eugen Stan
Enterprise Integration Patterns with Apache CamelEnterprise Integration Patterns with Apache Camel
Enterprise Integration Patterns with Apache Camel
Ioan Eugen Stan9.2K views
Developing Microservices with Apache Camel by Claus Ibsen
Developing Microservices with Apache CamelDeveloping Microservices with Apache Camel
Developing Microservices with Apache Camel
Claus Ibsen6.8K views
Apache Camel K - Copenhagen by Claus Ibsen
Apache Camel K - CopenhagenApache Camel K - Copenhagen
Apache Camel K - Copenhagen
Claus Ibsen681 views
Modernizing Legacy Applications in PHP, por Paul Jones by iMasters
Modernizing Legacy Applications in PHP, por Paul JonesModernizing Legacy Applications in PHP, por Paul Jones
Modernizing Legacy Applications in PHP, por Paul Jones
iMasters1.1K views
Apache Camel K - Copenhagen v2 by Claus Ibsen
Apache Camel K - Copenhagen v2Apache Camel K - Copenhagen v2
Apache Camel K - Copenhagen v2
Claus Ibsen1.8K views
Camel Day Italy 2021 - What's new in Camel 3 by Claus Ibsen
Camel Day Italy 2021 - What's new in Camel 3Camel Day Italy 2021 - What's new in Camel 3
Camel Day Italy 2021 - What's new in Camel 3
Claus Ibsen542 views
Apache Camel K - Fredericia by Claus Ibsen
Apache Camel K - FredericiaApache Camel K - Fredericia
Apache Camel K - Fredericia
Claus Ibsen751 views
Reactive Xamarin. UA Mobile 2016. by UA Mobile
Reactive Xamarin. UA Mobile 2016.Reactive Xamarin. UA Mobile 2016.
Reactive Xamarin. UA Mobile 2016.
UA Mobile288 views
PAC 2019 virtual Christoph NEUMÜLLER by Neotys
PAC 2019 virtual Christoph NEUMÜLLERPAC 2019 virtual Christoph NEUMÜLLER
PAC 2019 virtual Christoph NEUMÜLLER
Neotys66 views
Developer day - AWS: Fast Environments = Fast Deployments by Matthew Cwalinski
Developer day - AWS: Fast Environments = Fast DeploymentsDeveloper day - AWS: Fast Environments = Fast Deployments
Developer day - AWS: Fast Environments = Fast Deployments
Matthew Cwalinski407 views
Apache Camel Introduction & What's in the box by Claus Ibsen
Apache Camel Introduction & What's in the boxApache Camel Introduction & What's in the box
Apache Camel Introduction & What's in the box
Claus Ibsen4.8K views
State of Akka 2017 - The best is yet to come by Konrad Malawski
State of Akka 2017 - The best is yet to comeState of Akka 2017 - The best is yet to come
State of Akka 2017 - The best is yet to come
Konrad Malawski5.5K views
Using The Right Tool For The Job by Chris Baldock
Using The Right Tool For The JobUsing The Right Tool For The Job
Using The Right Tool For The Job
Chris Baldock2.1K views
ApacheCon EU 2016 - Apache Camel the integration library by Claus Ibsen
ApacheCon EU 2016 - Apache Camel the integration libraryApacheCon EU 2016 - Apache Camel the integration library
ApacheCon EU 2016 - Apache Camel the integration library
Claus Ibsen1.7K views
Ansible benelux meetup - Amsterdam 27-5-2015 by Pavel Chunyayev
Ansible benelux meetup - Amsterdam 27-5-2015Ansible benelux meetup - Amsterdam 27-5-2015
Ansible benelux meetup - Amsterdam 27-5-2015
Pavel Chunyayev875 views
Using Apache Camel as AKKA by Johan Edstrom
Using Apache Camel as AKKAUsing Apache Camel as AKKA
Using Apache Camel as AKKA
Johan Edstrom6.2K views

Viewers also liked

T2 ejercicio04 by
T2 ejercicio04T2 ejercicio04
T2 ejercicio04University of Granada
345 views8 slides
Actividad nº 2 by
Actividad nº 2Actividad nº 2
Actividad nº 2Renzo Higinio
193 views4 slides
Taller n°1 yili leidy by
Taller n°1 yili leidyTaller n°1 yili leidy
Taller n°1 yili leidytatiana sanchez
147 views18 slides
Nrszh by
NrszhNrszh
NrszhJulia Czenner
615 views1 slide
La animales en peligro de extinción by
La animales en peligro de extinciónLa animales en peligro de extinción
La animales en peligro de extincióngetina24
115 views3 slides
Steiner by
SteinerSteiner
SteinerJoão Soares
1K views6 slides

Viewers also liked(11)

La animales en peligro de extinción by getina24
La animales en peligro de extinciónLa animales en peligro de extinción
La animales en peligro de extinción
getina24115 views
SearchLove Boston 2016 | Kindra Hall | Storytelling: The Secret of Irresistib... by Distilled
SearchLove Boston 2016 | Kindra Hall | Storytelling: The Secret of Irresistib...SearchLove Boston 2016 | Kindra Hall | Storytelling: The Secret of Irresistib...
SearchLove Boston 2016 | Kindra Hall | Storytelling: The Secret of Irresistib...
Distilled4.2K views
Introducing Cloudera Director at Big Data Bash by Andrei Savu
Introducing Cloudera Director at Big Data BashIntroducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data Bash
Andrei Savu1.3K views
Onboarding The Ruby Way by Layne McNish
Onboarding The Ruby WayOnboarding The Ruby Way
Onboarding The Ruby Way
Layne McNish522 views

Similar to Building Efficient and Reliable Crawler System With Sidekiq Enterprise

End to-end async and await by
End to-end async and awaitEnd to-end async and await
End to-end async and awaitvfabro
1.5K views43 slides
Manchester Serverless Meetup - July 2018 by
Manchester Serverless Meetup - July 2018Manchester Serverless Meetup - July 2018
Manchester Serverless Meetup - July 2018Jonathan Vines
193 views30 slides
Startup DevOps - Jon Milsom Pitchero - Leeds DevOps - August 2014 by
Startup DevOps - Jon Milsom Pitchero - Leeds DevOps - August 2014Startup DevOps - Jon Milsom Pitchero - Leeds DevOps - August 2014
Startup DevOps - Jon Milsom Pitchero - Leeds DevOps - August 2014Jon Milsom
840 views77 slides
Continuous Integration, the minimum viable product by
Continuous Integration, the minimum viable productContinuous Integration, the minimum viable product
Continuous Integration, the minimum viable productJulian Simpson
4.3K views60 slides
Asynchronous Processing with Ruby on Rails (RailsConf 2008) by
Asynchronous Processing with Ruby on Rails (RailsConf 2008)Asynchronous Processing with Ruby on Rails (RailsConf 2008)
Asynchronous Processing with Ruby on Rails (RailsConf 2008)Jonathan Dahl
21.4K views117 slides
Writing Asynchronous Programs with Scala & Akka by
Writing Asynchronous Programs with Scala & AkkaWriting Asynchronous Programs with Scala & Akka
Writing Asynchronous Programs with Scala & AkkaYardena Meymann
654 views41 slides

Similar to Building Efficient and Reliable Crawler System With Sidekiq Enterprise(20)

End to-end async and await by vfabro
End to-end async and awaitEnd to-end async and await
End to-end async and await
vfabro1.5K views
Manchester Serverless Meetup - July 2018 by Jonathan Vines
Manchester Serverless Meetup - July 2018Manchester Serverless Meetup - July 2018
Manchester Serverless Meetup - July 2018
Jonathan Vines193 views
Startup DevOps - Jon Milsom Pitchero - Leeds DevOps - August 2014 by Jon Milsom
Startup DevOps - Jon Milsom Pitchero - Leeds DevOps - August 2014Startup DevOps - Jon Milsom Pitchero - Leeds DevOps - August 2014
Startup DevOps - Jon Milsom Pitchero - Leeds DevOps - August 2014
Jon Milsom840 views
Continuous Integration, the minimum viable product by Julian Simpson
Continuous Integration, the minimum viable productContinuous Integration, the minimum viable product
Continuous Integration, the minimum viable product
Julian Simpson4.3K views
Asynchronous Processing with Ruby on Rails (RailsConf 2008) by Jonathan Dahl
Asynchronous Processing with Ruby on Rails (RailsConf 2008)Asynchronous Processing with Ruby on Rails (RailsConf 2008)
Asynchronous Processing with Ruby on Rails (RailsConf 2008)
Jonathan Dahl21.4K views
Writing Asynchronous Programs with Scala & Akka by Yardena Meymann
Writing Asynchronous Programs with Scala & AkkaWriting Asynchronous Programs with Scala & Akka
Writing Asynchronous Programs with Scala & Akka
Yardena Meymann654 views
Managing Thousands of Spark Workers in Cloud Environment with Yuhao Zheng and... by Databricks
Managing Thousands of Spark Workers in Cloud Environment with Yuhao Zheng and...Managing Thousands of Spark Workers in Cloud Environment with Yuhao Zheng and...
Managing Thousands of Spark Workers in Cloud Environment with Yuhao Zheng and...
Databricks324 views
Voxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVM by Manuel Bernhardt
Voxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVMVoxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVM
Voxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVM
Manuel Bernhardt1.4K views
Give your little scripts big wings: Using cron in the cloud with Amazon Simp... by Amazon Web Services
Give your little scripts big wings:  Using cron in the cloud with Amazon Simp...Give your little scripts big wings:  Using cron in the cloud with Amazon Simp...
Give your little scripts big wings: Using cron in the cloud with Amazon Simp...
Amazon Web Services6.4K views
從零開始的爬蟲之旅 Crawler from zero by Shi-Ken Don
從零開始的爬蟲之旅 Crawler from zero從零開始的爬蟲之旅 Crawler from zero
從零開始的爬蟲之旅 Crawler from zero
Shi-Ken Don1.2K views
Rails Performance Tricks and Treats by Marshall Yount
Rails Performance Tricks and TreatsRails Performance Tricks and Treats
Rails Performance Tricks and Treats
Marshall Yount1K views
Performance Benchmarking: Tips, Tricks, and Lessons Learned by Tim Callaghan
Performance Benchmarking: Tips, Tricks, and Lessons LearnedPerformance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons Learned
Tim Callaghan1.9K views
Advanced technic for OS upgrading in 3 minutes by Hiroshi SHIBATA
Advanced technic for OS upgrading in 3 minutesAdvanced technic for OS upgrading in 3 minutes
Advanced technic for OS upgrading in 3 minutes
Hiroshi SHIBATA42K views
[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data by Sumit Rangwala
[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data
[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data
Sumit Rangwala2.1K views
Angular - Improve Runtime performance 2019 by Eliran Eliassy
Angular - Improve Runtime performance 2019Angular - Improve Runtime performance 2019
Angular - Improve Runtime performance 2019
Eliran Eliassy749 views
Serverless in production, an experience report (FullStack 2018) by Yan Cui
Serverless in production, an experience report (FullStack 2018)Serverless in production, an experience report (FullStack 2018)
Serverless in production, an experience report (FullStack 2018)
Yan Cui359 views
WinOps Conf 2016 - Michael Greene - Release Pipelines by WinOps Conf
WinOps Conf 2016 - Michael Greene - Release PipelinesWinOps Conf 2016 - Michael Greene - Release Pipelines
WinOps Conf 2016 - Michael Greene - Release Pipelines
WinOps Conf1.2K views
Ruote: A Ruby workflow engine by Wes Gamble
Ruote:  A Ruby workflow engineRuote:  A Ruby workflow engine
Ruote: A Ruby workflow engine
Wes Gamble6.2K views

Recently uploaded

20231129 - Platform @ localhost 2023 - Application-driven infrastructure with... by
20231129 - Platform @ localhost 2023 - Application-driven infrastructure with...20231129 - Platform @ localhost 2023 - Application-driven infrastructure with...
20231129 - Platform @ localhost 2023 - Application-driven infrastructure with...sparkfabrik
8 views46 slides
JioEngage_Presentation.pptx by
JioEngage_Presentation.pptxJioEngage_Presentation.pptx
JioEngage_Presentation.pptxadmin125455
8 views4 slides
Benefits in Software Development by
Benefits in Software DevelopmentBenefits in Software Development
Benefits in Software DevelopmentJohn Valentino
5 views15 slides
Fleet Management Software in India by
Fleet Management Software in India Fleet Management Software in India
Fleet Management Software in India Fleetable
12 views1 slide
DRYiCE™ iAutomate: AI-enhanced Intelligent Runbook Automation by
DRYiCE™ iAutomate: AI-enhanced Intelligent Runbook AutomationDRYiCE™ iAutomate: AI-enhanced Intelligent Runbook Automation
DRYiCE™ iAutomate: AI-enhanced Intelligent Runbook AutomationHCLSoftware
6 views8 slides
Programming Field by
Programming FieldProgramming Field
Programming Fieldthehardtechnology
6 views9 slides

Recently uploaded(20)

20231129 - Platform @ localhost 2023 - Application-driven infrastructure with... by sparkfabrik
20231129 - Platform @ localhost 2023 - Application-driven infrastructure with...20231129 - Platform @ localhost 2023 - Application-driven infrastructure with...
20231129 - Platform @ localhost 2023 - Application-driven infrastructure with...
sparkfabrik8 views
JioEngage_Presentation.pptx by admin125455
JioEngage_Presentation.pptxJioEngage_Presentation.pptx
JioEngage_Presentation.pptx
admin1254558 views
Fleet Management Software in India by Fleetable
Fleet Management Software in India Fleet Management Software in India
Fleet Management Software in India
Fleetable12 views
DRYiCE™ iAutomate: AI-enhanced Intelligent Runbook Automation by HCLSoftware
DRYiCE™ iAutomate: AI-enhanced Intelligent Runbook AutomationDRYiCE™ iAutomate: AI-enhanced Intelligent Runbook Automation
DRYiCE™ iAutomate: AI-enhanced Intelligent Runbook Automation
HCLSoftware6 views
How Workforce Management Software Empowers SMEs | TraQSuite by TraQSuite
How Workforce Management Software Empowers SMEs | TraQSuiteHow Workforce Management Software Empowers SMEs | TraQSuite
How Workforce Management Software Empowers SMEs | TraQSuite
TraQSuite6 views
Introduction to Git Source Control by John Valentino
Introduction to Git Source ControlIntroduction to Git Source Control
Introduction to Git Source Control
John Valentino7 views
Gen Apps on Google Cloud PaLM2 and Codey APIs in Action by Márton Kodok
Gen Apps on Google Cloud PaLM2 and Codey APIs in ActionGen Apps on Google Cloud PaLM2 and Codey APIs in Action
Gen Apps on Google Cloud PaLM2 and Codey APIs in Action
Márton Kodok16 views
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated... by TomHalpin9
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...
TomHalpin96 views
ADDO_2022_CICID_Tom_Halpin.pdf by TomHalpin9
ADDO_2022_CICID_Tom_Halpin.pdfADDO_2022_CICID_Tom_Halpin.pdf
ADDO_2022_CICID_Tom_Halpin.pdf
TomHalpin95 views
Quality Engineer: A Day in the Life by John Valentino
Quality Engineer: A Day in the LifeQuality Engineer: A Day in the Life
Quality Engineer: A Day in the Life
John Valentino7 views

Building Efficient and Reliable Crawler System With Sidekiq Enterprise