SlideShare a Scribd company logo
Photo: http://cliparts.co/clipart/3666251
Has anyone ever written crawlers?
Has anyone ever used cron?
Has anyone ever used Sidekiq?
Gary (Chien-Wei Chu)

@icarus4 / @icarus4.chu
Was a C programmer

Fall in love with Ruby since 2013
CTO of Statementdog
I Play
Photo: https://static01.nyt.com/images/2016/08/19/sports/19BADMINTONweb3/19BADMINTONweb3-master675.jpg
Photo: http://classic.battle.net/images/battle/scc/protoss/pix/units/screenshots/d05.jpg
Photo: http://resources.workable.com/wp-content/uploads/2015/08/ruby-560x224.jpg
• Introduction to Statementdog
• Introduction to Statementdog
• Data behind Statementdog
• Introduction to Statementdog
• Data behind Statementdog
• Past practice of Statementdog
• Introduction to Statementdog
• Data behind Statementdog
• Past practice of Statementdog
• Problems of the past practice
• Introduction to Statementdog
• Data behind Statementdog
• Past practice of Statementdog
• Problems of the past practice
• How we design our system to solve the problems.
Focus on:
• More reliable job scheduling
• Dealing with throttling issue
(Revenue)
(Revenue)
(EPS)
(Revenue)
(EPS)
(Gross Margin)
(Net Income)
(Revenue)
(EPS)
(Gross Margin)
(Net Income)
(Assets)
(Liabilities)
(Revenue)
(EPS)
(Gross Margin)
(Net Income)
(Assets)
(Liabilities)
(Operating Cash Flow)
(Free Cash Flow)
(Investing Cash Flow)
(Revenue)
(EPS)
(Gross Margin)
(Net Income)
(Assets)
(Liabilities)
(Operating Cash Flow)
(Free Cash Flow)
(Investing Cash Flow)
(ROE)
(ROA)
(Accounts Receivable)
(Accounts Payable)
(Revenue)
(EPS)
(Gross Margin)
(Net Income)
(Assets)
(Liabilities)
(Operating Cash Flow)
(Free Cash Flow)
(Investing Cash Flow)
(ROE)
(ROA)
(Accounts Receivable)
(Accounts Payable)
(Revenue)
(EPS)
(Gross Margin)
(Net Income)
(Assets)
(Liabilities)
(Operating Cash Flow)
(Free Cash Flow)
(Investing Cash Flow)
(ROE)
(ROA)
(Accounts Receivable)
(Accounts Payable)
(PMI)
(Revenue)
(EPS)
(Gross Margin)
(Net Income)
(Assets)
(Liabilities)
(Operating Cash Flow)
(Free Cash Flow)
(Investing Cash Flow)
(ROE)
(ROA)
(Accounts Receivable)
(Accounts Payable)
(PMI)
GDP


Taiwan Market Observation Post System ( )
Taiwan Stock Exchange ( )
Taiwan Depository & Clearing Corporation ( )
Yahoo Stock Feed
…
…
Yearly - dividend, remuneration of directors and supervisors
Quarterly - quarterly financial statements
Monthly - Revenue
Weekly -
Daily - closing price
Hourly - stock news from Yahoo stock feed
Minutely - important news from Taiwan Market Observation Post System
Something like this,
but written in PHP
A super long running process (1 hour+)
loops from the first stock to the last one
Stock.find_each do |stock| 

# download xml financial report data 

… 

# extract xml data 

… 

# calculate advanced data 

…



end 

A super long running process
for quarterly report
A super long running process
for quarterly report
A super long running process
for monthly revenue
A super long running process
for quarterly report
A super long running process
for monthly revenue
A super long running process
for daily price
A super long running process
for quarterly report
A super long running process
for monthly revenue
A super long running process
for daily price
A super long running process
for news
.
.
.
• Really slow
• Really slow
• Inefficient - unable to only retry the failed one
• Really slow
• Inefficient - unable to only retry the failed one
• Unpredictable server loading
Job 1 Job 2 Job 3
Time
When the server loading is low
Job 4 Job 5
Server

loading
When the server loading is HIGH
Time
Server

loading
Other task
Job 1
Job 2
Job 3
When the server loading is HIGH
Job 4
Job 5
Time
Server

loading
Other task
Job 1
Job 2
Job 3
When the server loading is HIGH
Job 4
Job 5
Time
Server

loading
Other task
Too many crawler processes executed at the same time
• Really slow
• Inefficient - unable to only retry the failed one.
• Unpredictable server loading
• Scale out is not easy
• Inherent problems of Unix Cron:
• Inherent problems of Unix Cron:
• Unreliable scheduling
• Inherent problems of Unix Cron:
• Unreliable scheduling
• High availability is not easy
• Inherent problems of Unix Cron:
• Unreliable scheduling
• High availability is not easy
• Hard to prioritize job by the popularity
• Inherent problems of Unix Cron:
• Unreliable scheduling
• High availability is not easy
• Hard to prioritize job by the popularity
• Not easy to deal with bandwidth throttling issue


Created by Mike Perham
Web server
Request
Request
Request
.
.
.
Process
Request
Request
Request
.
.
.
Job queue
push to queue

(very fast)
Web server
Process
Request
Request
Request
.
.
.
Job queue
push to queue

(very fast)
Worker process
Worker process
.
.
.
Worker server
Worker process
Web server
Process
Request
Request
Request
.
.
.
Job queue
push to queue

(very fast)
Worker process
Worker process
.
.
.
Worker server
Worker process
Web server
Process Add extra servers
when needed
Request
Request
Request
.
.
.
Job queue
push to queue

(very fast)
Producer
Worker process
Worker process
.
.
.
Worker server
Worker process
Web server
Process
Request
Request
Request
.
.
.
Job queue
push to queue

(very fast)
Producer
Consumer
Worker process
Worker process
.
.
.
Worker server
Worker process
Web server
Process
Worker process
thread 1
thread 2
thread 3
thread 25
.
.
.
Worker process v.s.
Multi-threadSingle process
Worker process
thread 1
thread 2
thread 3
thread 25
.
.
.
Worker process
1 : 25
Multi-threadSingle process
Multi-thread
Worker process
thread 1
thread 2
thread 3
thread 25
.
.
.
Single process
Worker process
1 : 25
With the same degree of memory consumption
Sidekiq (OSS)
Sidekiq Pro
Sidekiq Enterprise
Sidekiq Pro Sidekiq Enterprise
Batches
Enhanced Reliability
Search in Web UI
Worker Metrics
Expiring Jobs
Rate Limiting
Periodic Jobs
Unique Jobs
Historical Metrics
Multi-process
Encryption
Parallelism Make Things Faster
• Really slow
• Inefficient - unable to only retry the failed one.
• Unpredictable server loading
• Scale out is not easy
• Efficient - only retry the failed one
• Predictable server loading
• Easy to scale out
• Really slow
• Inefficient - unable to only retry the failed one.
• Unpredictable server loading
• Scale out is not easy
• Inherent problem of Unix Cron:
• Unreliable scheduling
• High availability is not easy
• Hard to prioritize job by the popularity
• Not easy to deal with bandwidth throttling issue
–Mike Perham, CEO, Contributed Systems,
Creator of Sidekiq
Keep states of cron executions in 

our robustest part of system - database
All scheduled jobs are invoked by a particular job
executed minutely
Keep states of cron executions in 

our robustest part of system - database
All scheduled jobs are invoked by a particular job
executed minutely
create_table :cron_jobs do |t|


t.string :klass, null: false


t.string :cron_expression, null: false


t.timestamp :next_run_at, null: false, index: true


end 



Create table for storing cron settings
table name: cron_jobs
create_table :cron_jobs do |t|


t.string :klass, null: false


t.string :cron_expression, null: false


t.timestamp :next_run_at, null: false, index: true


end 



Create table for storing cron settings
worker class name
create_table :cron_jobs do |t|


t.string :klass, null: false


t.string :cron_expression, null: false


t.timestamp :next_run_at, null: false, index: true


end 



Create table for storing cron settings
Something like
0 */2 * * *
create_table :cron_jobs do |t|


t.string :klass, null: false


t.string :cron_expression, null: false


t.timestamp :next_run_at, null: false, index: true


end 



Create table for storing cron settings
when will a job should be executed
klass cron_expression next_run_at
Push2000NewsJobs “0 */2 * * *” …
Push2000DailyPriceJobs “0 2 * * 1-5” …
Push2000MonthlyRevenueJobs “0 0 10 * *” …
…
# Add to your Cron setting


every :minute do 

runner 'CronJobWorker.perform_async' 

end
Cron only schedules one job minutely
class CronJobWorker 

include Sidekiq::Worker 



def perform 

CronJob.find_each("next_run_at <= ?", Time.now) do |job| 





end 

end 

end 



CronJobWorker to invoke all of your crawlers
Find jobs should be executed
class CronJobWorker 

include Sidekiq::Worker 



def perform 

CronJob.find_each("next_run_at <= ?", Time.now) do |job| 



Sidekiq::Client.push(
class: job.klass.constantize,
args: ['foo', ‘bar']
)



end 

end 

end 



CronJobWorker to invoke all of your crawlers
Push jobs to job queue
class CronJobWorker 

include Sidekiq::Worker 



def perform 

CronJob.find_each("next_run_at <= ?", Time.now) do |job| 



Sidekiq::Client.push(
class: job.klass.constantize,
args: ['foo', ‘bar']
)



x = Sidekiq::CronParser.new(job.cron_expression) 

job.update!(next_run_at: x.next.to_time) 

end 

end 

end 



CronJobWorker to invoke all of your crawlers
Setup the next execution time
class CronJobWorker 

include Sidekiq::Worker 



def perform 

CronJob.find_each("next_run_at <= ?", Time.now) do |job| 



Sidekiq::Client.push(
class: job.klass.constantize,
args: ['foo', ‘bar']
)



x = Sidekiq::CronParser.new(job.cron_expression) 

job.update!(next_run_at: x.next.to_time) 

end 

end 

end 



CronJobWorker to invoke all of your crawlers
The missed job executions will be
executed at next minute
• Inherent problem of Unix Cron:
• Unreliable scheduling
• Hard to prioritize job by the popularity
• High availability is not easy
• Not easy to deal with bandwidth throttling issue
Drawbacks solved
• Inherent problem of Unix Cron:
• Unreliable scheduling
• Hard to prioritize job by the popularity
• High availability is not easy
• Not easy to deal with bandwidth throttling issue


table: cron_jobs
klass cron_expression args next_run_at
Push2000NewsJobs “0 */2 * * *” [] …


table: cron_jobs
klass cron_expression args next_run_at
Push2000NewsJobs “0 */2 * * *” [] …
NewsWorker “*/30 * * * *” [popular_stock_id_1] …
NewsWorker “*/30 * * * *” [popular_stock_id_2] …
…
Drawbacks solved
• Inherent problem of Unix Cron:
• Unreliable scheduling
• Hard to prioritize job by the popularity
• High availability is not easy
• Not easy to deal with bandwidth throttling issue
• Inherent problem of Unix Cron:
• Unreliable scheduling
• Hard to prioritize job by the popularity
• High availability is not easy
• Not easy to deal with bandwidth throttling issue
Sidekiq.configure_server do |config| 

config.periodic do |mgr| 

mgr.register("* * * * * *", CronJobWorker) 

end 

end 



• Inherent problem of Unix Cron:
• Unreliable scheduling
• Hard to prioritize job by the popularity
• High availability is not easy
• Not easy to deal with bandwidth throttling issue
• Inherent problem of Unix Cron:
• Unreliable scheduling
• Hard to prioritize job by the popularity
• High availability is not easy
• Not easy to deal with bandwidth throttling issue
You always want your crawler 

as fast as possible
However, your target server doesn’t
always allow you to crawl with
unlimited rate
Insert 2000 jobs to the queue at the same time
Stock.pluck(:id).each do |stock_id| 

SomeWorker.perform_async(stock_id) 

end 

If you want to craw data for your 2000 stocks
Assume a target server accepts request at
maximum rate equals to
1 request / second
Time
(second)
1 2 3
job1
job2
job3
.
.
.
job2000
Insert 2000 jobs to the queue at the same time
All of your jobs may be blocked (except the first one)
Improvement 1
Schedule jobs with incremental delays
Stock.pluck(:id).each_with_index do |stock_id, index| 

SomeWorker.perform_in(index, stock_id) 

end 

Time
(second)
1 2 3
job1 job2 job3
…
job2000
2000
Workable, but…
1
job1 job2 job3
…
job2000
If the target server is unreachable
Time
(second)
Workable, but…
1 2 3
job1 job2 job3
…
job2000
2000
If the target server is unreachable
job3~2000 will still execute at the same time
Time
(second)
• Limit your worker thread to perform specific job
with bounded rate
• Sidekiq Enterprise provides two types of rate
limiting API
CONCURRENT_LIMITER = Sidekiq::Limiter.concurrent('price', 10) 



def perform(...) 

CONCURRENT_LIMITER.within_limit do 

# crawl stock data


end 

end 

CONCURRENT_LIMITER = Sidekiq::Limiter.concurrent('price', 10) 



def perform(...) 

CONCURRENT_LIMITER.within_limit do 

# crawl stock data


end 

end 

Only 10 concurrent operations inside the block
can happen at any given moment
BUCKET_LIMITER = Sidekiq::Limiter.bucket('price', 10, :second) 



def perform(...) 

BUCKET_LIMITER.within_limit do 

# crawl stock data


end 

end 

For every second, you can perform up to 10 operations
You must fine tune parameters of your limiter
for each data source for better performance
By far, you already got better performance.
However, the throttling control of your target server 

may not always be static.
Many websites are dynamically throttling controlled.


If throttling detected, pause your workers for a while
Redis (job queue)
Redis (job queue)
default
critical
low
Redis (job queue)
default
critical
low
Worker thread
Worker thread
Worker thread
Worker thread
Worker thread
Redis (job queue)
default
critical
low
Worker thread
Worker thread
Worker thread
Worker thread
Worker thread
Redis (job queue)
default
critical
low
Worker thread
Worker thread
Worker thread
Worker thread
Worker thread
yahoo
Redis (job queue)
default
critical
low
Worker thread
Worker thread
Worker thread
Worker thread
Worker thread
yahoo

(paused)
Pause this queue when throttled
Redis (job queue)
default
critical
low
Worker thread
Worker thread
Worker thread
Worker thread
Worker thread
Schedule a job executed after few seconds 

to “unpause" job in another queue
yahoo

(paused)
Redis (job queue)
default
critical
low
Worker thread
Worker thread
Worker thread
Worker thread
Worker thread
yahoo
(resumed)
Resumed after the unpause queue job executed
class SomeWorker 

include Sidekiq::Worker 



def perform 

# try to crawl something 

# ... 



if throttled 

queue_name = self.class.get_sidekiq_options['queue'] 

queue = Sidekiq::Queue.new(queue_name) 

queue.pause! 

ResumeJobQueueWorker.perform_in(30.seconds, queue_name) 

end 

end 

end 







class SomeWorker 

include Sidekiq::Worker 



def perform 

# try to crawl something 

# ... 



if throttled 

queue_name = self.class.get_sidekiq_options['queue'] 

queue = Sidekiq::Queue.new(queue_name) 

queue.pause! 

ResumeJobQueueWorker.perform_in(30.seconds, queue_name) 

end 

end 

end 







class SomeWorker 

include Sidekiq::Worker 



def perform 

# try to crawl something 

# ... 



if throttled 

queue_name = self.class.get_sidekiq_options['queue'] 

queue = Sidekiq::Queue.new(queue_name) 

queue.pause! 

ResumeJobQueueWorker.perform_in(30.seconds, queue_name) 

end 

end 

end 



class ResumeJobQueueWorker 

include Sidekiq::Worker 

sidekiq_options queue: :queue_control, unique: :until_executed 



def perform(queue_name) 

queue = Sidekiq::Queue.new(queue_name) 

queue.unpause! if queue.paused? 

end 

end 



The queue for ResumeJobQueueWorker

MUST NOT equal to the paused queue
We have a dedicated queue for
ResumeJobQueueWorker
Decrease Sidekiq server poll interval for more
precise timing control
Queue pausing alleviates throttling issues
Is it possible for us to do things even better?
Most throttling control aim to block requests
from the same IP address
We can change our IP address via
proxy service
Sidekiq
server
Target
server
a.b.c.d
Sidekiq
server
Target
server
a.b.c.d
a.b.c.d
Sidekiq
server
Target
server
a.b.c.d
a.b.c.d
a.b.c.d
a.b.c.d
Same IP for each request
Sidekiq
server
Target
server
a.b.c.d
Proxy
service
end
point
Sidekiq
server
Target
server
a.b.c.d
Proxy
service
end
point
proxy server
e.f.g.h
Sidekiq
server
Target
server
a.b.c.d
a.b.c.d
Proxy
service
end
point
proxy server
proxy server
e.f.g.h
i.j.k.l
Sidekiq
server
Target
server
a.b.c.d
a.b.c.d
a.b.c.d
a.b.c.d
Proxy
service
end
point
proxy server
proxy server
proxy server
proxy server
e.f.g.h
i.j.k.l
m.n.o.p
q.r.s.t
Sidekiq
server
Target
server
a.b.c.d
a.b.c.d
a.b.c.d
a.b.c.d
Proxy
service
end
point
proxy server
proxy server
proxy server
proxy server
e.f.g.h
i.j.k.l
m.n.o.p
q.r.s.t
Different IP for each request
• Inherent problem of Unix Cron:
• Unreliable scheduling
• Hard to prioritize job by the popularity
• High availability is not easy
• Not easy to deal with bandwidth throttling issue
• With Sidekiq (Enterprise) and a proper design, the following problems
are solved
• Slow crawler
• Inefficient - unable to only retry the failed one
• Unpredictable server loading
• Scale out is not easy
• Inherent problem of Unix Cron
• Not easy to deal with bandwidth throttling issue
Building Efficient and Reliable Crawler System With Sidekiq Enterprise
Building Efficient and Reliable Crawler System With Sidekiq Enterprise

More Related Content

What's hot

Modern websites in 2020 and Joomla
Modern websites in 2020 and JoomlaModern websites in 2020 and Joomla
Modern websites in 2020 and Joomla
George Wilson
 
Lotuscript for large systems
Lotuscript for large systemsLotuscript for large systems
Lotuscript for large systems
Bill Buchan
 
The Many Ways to Test Your React App
The Many Ways to Test Your React AppThe Many Ways to Test Your React App
The Many Ways to Test Your React App
All Things Open
 
Enterprise Integration Patterns with Apache Camel
Enterprise Integration Patterns with Apache CamelEnterprise Integration Patterns with Apache Camel
Enterprise Integration Patterns with Apache Camel
Ioan Eugen Stan
 
Developing Microservices with Apache Camel
Developing Microservices with Apache CamelDeveloping Microservices with Apache Camel
Developing Microservices with Apache Camel
Claus Ibsen
 
Apache Camel K - Copenhagen
Apache Camel K - CopenhagenApache Camel K - Copenhagen
Apache Camel K - Copenhagen
Claus Ibsen
 
Modernizing Legacy Applications in PHP, por Paul Jones
Modernizing Legacy Applications in PHP, por Paul JonesModernizing Legacy Applications in PHP, por Paul Jones
Modernizing Legacy Applications in PHP, por Paul Jones
iMasters
 
Apache Camel K - Copenhagen v2
Apache Camel K - Copenhagen v2Apache Camel K - Copenhagen v2
Apache Camel K - Copenhagen v2
Claus Ibsen
 
Camel Day Italy 2021 - What's new in Camel 3
Camel Day Italy 2021 - What's new in Camel 3Camel Day Italy 2021 - What's new in Camel 3
Camel Day Italy 2021 - What's new in Camel 3
Claus Ibsen
 
Apache Camel K - Fredericia
Apache Camel K - FredericiaApache Camel K - Fredericia
Apache Camel K - Fredericia
Claus Ibsen
 
Reactive Xamarin. UA Mobile 2016.
Reactive Xamarin. UA Mobile 2016.Reactive Xamarin. UA Mobile 2016.
Reactive Xamarin. UA Mobile 2016.
UA Mobile
 
PAC 2019 virtual Christoph NEUMÜLLER
PAC 2019 virtual Christoph NEUMÜLLERPAC 2019 virtual Christoph NEUMÜLLER
PAC 2019 virtual Christoph NEUMÜLLER
Neotys
 
Ruby performance - The low hanging fruit
Ruby performance - The low hanging fruitRuby performance - The low hanging fruit
Ruby performance - The low hanging fruit
Bruce Werdschinski
 
Developer day - AWS: Fast Environments = Fast Deployments
Developer day - AWS: Fast Environments = Fast DeploymentsDeveloper day - AWS: Fast Environments = Fast Deployments
Developer day - AWS: Fast Environments = Fast Deployments
Matthew Cwalinski
 
Apache Camel Introduction & What's in the box
Apache Camel Introduction & What's in the boxApache Camel Introduction & What's in the box
Apache Camel Introduction & What's in the box
Claus Ibsen
 
State of Akka 2017 - The best is yet to come
State of Akka 2017 - The best is yet to comeState of Akka 2017 - The best is yet to come
State of Akka 2017 - The best is yet to come
Konrad Malawski
 
Using The Right Tool For The Job
Using The Right Tool For The JobUsing The Right Tool For The Job
Using The Right Tool For The Job
Chris Baldock
 
ApacheCon EU 2016 - Apache Camel the integration library
ApacheCon EU 2016 - Apache Camel the integration libraryApacheCon EU 2016 - Apache Camel the integration library
ApacheCon EU 2016 - Apache Camel the integration library
Claus Ibsen
 
Ansible benelux meetup - Amsterdam 27-5-2015
Ansible benelux meetup - Amsterdam 27-5-2015Ansible benelux meetup - Amsterdam 27-5-2015
Ansible benelux meetup - Amsterdam 27-5-2015
Pavel Chunyayev
 
Using Apache Camel as AKKA
Using Apache Camel as AKKAUsing Apache Camel as AKKA
Using Apache Camel as AKKA
Johan Edstrom
 

What's hot (20)

Modern websites in 2020 and Joomla
Modern websites in 2020 and JoomlaModern websites in 2020 and Joomla
Modern websites in 2020 and Joomla
 
Lotuscript for large systems
Lotuscript for large systemsLotuscript for large systems
Lotuscript for large systems
 
The Many Ways to Test Your React App
The Many Ways to Test Your React AppThe Many Ways to Test Your React App
The Many Ways to Test Your React App
 
Enterprise Integration Patterns with Apache Camel
Enterprise Integration Patterns with Apache CamelEnterprise Integration Patterns with Apache Camel
Enterprise Integration Patterns with Apache Camel
 
Developing Microservices with Apache Camel
Developing Microservices with Apache CamelDeveloping Microservices with Apache Camel
Developing Microservices with Apache Camel
 
Apache Camel K - Copenhagen
Apache Camel K - CopenhagenApache Camel K - Copenhagen
Apache Camel K - Copenhagen
 
Modernizing Legacy Applications in PHP, por Paul Jones
Modernizing Legacy Applications in PHP, por Paul JonesModernizing Legacy Applications in PHP, por Paul Jones
Modernizing Legacy Applications in PHP, por Paul Jones
 
Apache Camel K - Copenhagen v2
Apache Camel K - Copenhagen v2Apache Camel K - Copenhagen v2
Apache Camel K - Copenhagen v2
 
Camel Day Italy 2021 - What's new in Camel 3
Camel Day Italy 2021 - What's new in Camel 3Camel Day Italy 2021 - What's new in Camel 3
Camel Day Italy 2021 - What's new in Camel 3
 
Apache Camel K - Fredericia
Apache Camel K - FredericiaApache Camel K - Fredericia
Apache Camel K - Fredericia
 
Reactive Xamarin. UA Mobile 2016.
Reactive Xamarin. UA Mobile 2016.Reactive Xamarin. UA Mobile 2016.
Reactive Xamarin. UA Mobile 2016.
 
PAC 2019 virtual Christoph NEUMÜLLER
PAC 2019 virtual Christoph NEUMÜLLERPAC 2019 virtual Christoph NEUMÜLLER
PAC 2019 virtual Christoph NEUMÜLLER
 
Ruby performance - The low hanging fruit
Ruby performance - The low hanging fruitRuby performance - The low hanging fruit
Ruby performance - The low hanging fruit
 
Developer day - AWS: Fast Environments = Fast Deployments
Developer day - AWS: Fast Environments = Fast DeploymentsDeveloper day - AWS: Fast Environments = Fast Deployments
Developer day - AWS: Fast Environments = Fast Deployments
 
Apache Camel Introduction & What's in the box
Apache Camel Introduction & What's in the boxApache Camel Introduction & What's in the box
Apache Camel Introduction & What's in the box
 
State of Akka 2017 - The best is yet to come
State of Akka 2017 - The best is yet to comeState of Akka 2017 - The best is yet to come
State of Akka 2017 - The best is yet to come
 
Using The Right Tool For The Job
Using The Right Tool For The JobUsing The Right Tool For The Job
Using The Right Tool For The Job
 
ApacheCon EU 2016 - Apache Camel the integration library
ApacheCon EU 2016 - Apache Camel the integration libraryApacheCon EU 2016 - Apache Camel the integration library
ApacheCon EU 2016 - Apache Camel the integration library
 
Ansible benelux meetup - Amsterdam 27-5-2015
Ansible benelux meetup - Amsterdam 27-5-2015Ansible benelux meetup - Amsterdam 27-5-2015
Ansible benelux meetup - Amsterdam 27-5-2015
 
Using Apache Camel as AKKA
Using Apache Camel as AKKAUsing Apache Camel as AKKA
Using Apache Camel as AKKA
 

Viewers also liked

T2 ejercicio04
T2 ejercicio04T2 ejercicio04
T2 ejercicio04
University of Granada
 
Actividad nº 2
Actividad nº 2Actividad nº 2
Actividad nº 2
Renzo Higinio
 
Taller n°1 yili leidy
Taller n°1 yili leidyTaller n°1 yili leidy
Taller n°1 yili leidy
tatiana sanchez
 
La animales en peligro de extinción
La animales en peligro de extinciónLa animales en peligro de extinción
La animales en peligro de extinción
getina24
 
Steiner
SteinerSteiner
Steiner
João Soares
 
Breaking Bad Habits with GitLab CI
Breaking Bad Habits with GitLab CIBreaking Bad Habits with GitLab CI
Breaking Bad Habits with GitLab CI
Ivan Nemytchenko
 
Alternativas de mitigación
Alternativas de mitigaciónAlternativas de mitigación
Alternativas de mitigación
Marcela Navarro Martínez
 
SearchLove Boston 2016 | Kindra Hall | Storytelling: The Secret of Irresistib...
SearchLove Boston 2016 | Kindra Hall | Storytelling: The Secret of Irresistib...SearchLove Boston 2016 | Kindra Hall | Storytelling: The Secret of Irresistib...
SearchLove Boston 2016 | Kindra Hall | Storytelling: The Secret of Irresistib...
Distilled
 
Introducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data BashIntroducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data Bash
Andrei Savu
 
Onboarding The Ruby Way
Onboarding The Ruby WayOnboarding The Ruby Way
Onboarding The Ruby Way
Layne McNish
 

Viewers also liked (11)

T2 ejercicio04
T2 ejercicio04T2 ejercicio04
T2 ejercicio04
 
Actividad nº 2
Actividad nº 2Actividad nº 2
Actividad nº 2
 
Taller n°1 yili leidy
Taller n°1 yili leidyTaller n°1 yili leidy
Taller n°1 yili leidy
 
Nrszh
NrszhNrszh
Nrszh
 
La animales en peligro de extinción
La animales en peligro de extinciónLa animales en peligro de extinción
La animales en peligro de extinción
 
Steiner
SteinerSteiner
Steiner
 
Breaking Bad Habits with GitLab CI
Breaking Bad Habits with GitLab CIBreaking Bad Habits with GitLab CI
Breaking Bad Habits with GitLab CI
 
Alternativas de mitigación
Alternativas de mitigaciónAlternativas de mitigación
Alternativas de mitigación
 
SearchLove Boston 2016 | Kindra Hall | Storytelling: The Secret of Irresistib...
SearchLove Boston 2016 | Kindra Hall | Storytelling: The Secret of Irresistib...SearchLove Boston 2016 | Kindra Hall | Storytelling: The Secret of Irresistib...
SearchLove Boston 2016 | Kindra Hall | Storytelling: The Secret of Irresistib...
 
Introducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data BashIntroducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data Bash
 
Onboarding The Ruby Way
Onboarding The Ruby WayOnboarding The Ruby Way
Onboarding The Ruby Way
 

Similar to Building Efficient and Reliable Crawler System With Sidekiq Enterprise

End to-end async and await
End to-end async and awaitEnd to-end async and await
End to-end async and await
vfabro
 
Manchester Serverless Meetup - July 2018
Manchester Serverless Meetup - July 2018Manchester Serverless Meetup - July 2018
Manchester Serverless Meetup - July 2018
Jonathan Vines
 
Startup DevOps - Jon Milsom Pitchero - Leeds DevOps - August 2014
Startup DevOps - Jon Milsom Pitchero - Leeds DevOps - August 2014Startup DevOps - Jon Milsom Pitchero - Leeds DevOps - August 2014
Startup DevOps - Jon Milsom Pitchero - Leeds DevOps - August 2014
Jon Milsom
 
Continuous Integration, the minimum viable product
Continuous Integration, the minimum viable productContinuous Integration, the minimum viable product
Continuous Integration, the minimum viable product
Julian Simpson
 
Asynchronous Processing with Ruby on Rails (RailsConf 2008)
Asynchronous Processing with Ruby on Rails (RailsConf 2008)Asynchronous Processing with Ruby on Rails (RailsConf 2008)
Asynchronous Processing with Ruby on Rails (RailsConf 2008)
Jonathan Dahl
 
Writing Asynchronous Programs with Scala & Akka
Writing Asynchronous Programs with Scala & AkkaWriting Asynchronous Programs with Scala & Akka
Writing Asynchronous Programs with Scala & Akka
Yardena Meymann
 
Building source code level profiler for C++.pdf
Building source code level profiler for C++.pdfBuilding source code level profiler for C++.pdf
Building source code level profiler for C++.pdf
ssuser28de9e
 
Managing Thousands of Spark Workers in Cloud Environment with Yuhao Zheng and...
Managing Thousands of Spark Workers in Cloud Environment with Yuhao Zheng and...Managing Thousands of Spark Workers in Cloud Environment with Yuhao Zheng and...
Managing Thousands of Spark Workers in Cloud Environment with Yuhao Zheng and...
Databricks
 
Voxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVM
Voxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVMVoxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVM
Voxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVM
Manuel Bernhardt
 
Give your little scripts big wings: Using cron in the cloud with Amazon Simp...
Give your little scripts big wings:  Using cron in the cloud with Amazon Simp...Give your little scripts big wings:  Using cron in the cloud with Amazon Simp...
Give your little scripts big wings: Using cron in the cloud with Amazon Simp...
Amazon Web Services
 
從零開始的爬蟲之旅 Crawler from zero
從零開始的爬蟲之旅 Crawler from zero從零開始的爬蟲之旅 Crawler from zero
從零開始的爬蟲之旅 Crawler from zero
Shi-Ken Don
 
Rails Performance Tricks and Treats
Rails Performance Tricks and TreatsRails Performance Tricks and Treats
Rails Performance Tricks and Treats
Marshall Yount
 
Performance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons LearnedPerformance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons Learned
Tim Callaghan
 
Advanced technic for OS upgrading in 3 minutes
Advanced technic for OS upgrading in 3 minutesAdvanced technic for OS upgrading in 3 minutes
Advanced technic for OS upgrading in 3 minutes
Hiroshi SHIBATA
 
[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data
[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data
[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data
Sumit Rangwala
 
Angular - Improve Runtime performance 2019
Angular - Improve Runtime performance 2019Angular - Improve Runtime performance 2019
Angular - Improve Runtime performance 2019
Eliran Eliassy
 
Queue your work
Queue your workQueue your work
Queue your work
Jurian Sluiman
 
High Performance Drupal
High Performance DrupalHigh Performance Drupal
High Performance Drupal
Jeff Geerling
 
Serverless in production, an experience report (FullStack 2018)
Serverless in production, an experience report (FullStack 2018)Serverless in production, an experience report (FullStack 2018)
Serverless in production, an experience report (FullStack 2018)
Yan Cui
 
WinOps Conf 2016 - Michael Greene - Release Pipelines
WinOps Conf 2016 - Michael Greene - Release PipelinesWinOps Conf 2016 - Michael Greene - Release Pipelines
WinOps Conf 2016 - Michael Greene - Release Pipelines
WinOps Conf
 

Similar to Building Efficient and Reliable Crawler System With Sidekiq Enterprise (20)

End to-end async and await
End to-end async and awaitEnd to-end async and await
End to-end async and await
 
Manchester Serverless Meetup - July 2018
Manchester Serverless Meetup - July 2018Manchester Serverless Meetup - July 2018
Manchester Serverless Meetup - July 2018
 
Startup DevOps - Jon Milsom Pitchero - Leeds DevOps - August 2014
Startup DevOps - Jon Milsom Pitchero - Leeds DevOps - August 2014Startup DevOps - Jon Milsom Pitchero - Leeds DevOps - August 2014
Startup DevOps - Jon Milsom Pitchero - Leeds DevOps - August 2014
 
Continuous Integration, the minimum viable product
Continuous Integration, the minimum viable productContinuous Integration, the minimum viable product
Continuous Integration, the minimum viable product
 
Asynchronous Processing with Ruby on Rails (RailsConf 2008)
Asynchronous Processing with Ruby on Rails (RailsConf 2008)Asynchronous Processing with Ruby on Rails (RailsConf 2008)
Asynchronous Processing with Ruby on Rails (RailsConf 2008)
 
Writing Asynchronous Programs with Scala & Akka
Writing Asynchronous Programs with Scala & AkkaWriting Asynchronous Programs with Scala & Akka
Writing Asynchronous Programs with Scala & Akka
 
Building source code level profiler for C++.pdf
Building source code level profiler for C++.pdfBuilding source code level profiler for C++.pdf
Building source code level profiler for C++.pdf
 
Managing Thousands of Spark Workers in Cloud Environment with Yuhao Zheng and...
Managing Thousands of Spark Workers in Cloud Environment with Yuhao Zheng and...Managing Thousands of Spark Workers in Cloud Environment with Yuhao Zheng and...
Managing Thousands of Spark Workers in Cloud Environment with Yuhao Zheng and...
 
Voxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVM
Voxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVMVoxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVM
Voxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVM
 
Give your little scripts big wings: Using cron in the cloud with Amazon Simp...
Give your little scripts big wings:  Using cron in the cloud with Amazon Simp...Give your little scripts big wings:  Using cron in the cloud with Amazon Simp...
Give your little scripts big wings: Using cron in the cloud with Amazon Simp...
 
從零開始的爬蟲之旅 Crawler from zero
從零開始的爬蟲之旅 Crawler from zero從零開始的爬蟲之旅 Crawler from zero
從零開始的爬蟲之旅 Crawler from zero
 
Rails Performance Tricks and Treats
Rails Performance Tricks and TreatsRails Performance Tricks and Treats
Rails Performance Tricks and Treats
 
Performance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons LearnedPerformance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons Learned
 
Advanced technic for OS upgrading in 3 minutes
Advanced technic for OS upgrading in 3 minutesAdvanced technic for OS upgrading in 3 minutes
Advanced technic for OS upgrading in 3 minutes
 
[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data
[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data
[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data
 
Angular - Improve Runtime performance 2019
Angular - Improve Runtime performance 2019Angular - Improve Runtime performance 2019
Angular - Improve Runtime performance 2019
 
Queue your work
Queue your workQueue your work
Queue your work
 
High Performance Drupal
High Performance DrupalHigh Performance Drupal
High Performance Drupal
 
Serverless in production, an experience report (FullStack 2018)
Serverless in production, an experience report (FullStack 2018)Serverless in production, an experience report (FullStack 2018)
Serverless in production, an experience report (FullStack 2018)
 
WinOps Conf 2016 - Michael Greene - Release Pipelines
WinOps Conf 2016 - Michael Greene - Release PipelinesWinOps Conf 2016 - Michael Greene - Release Pipelines
WinOps Conf 2016 - Michael Greene - Release Pipelines
 

Recently uploaded

What is Augmented Reality Image Tracking
What is Augmented Reality Image TrackingWhat is Augmented Reality Image Tracking
What is Augmented Reality Image Tracking
pavan998932
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptxLORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
lorraineandreiamcidl
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
Green Software Development
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
Shane Coughlan
 
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise EditionWhy Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Envertis Software Solutions
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Julian Hyde
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
Grant Fritchey
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Neo4j
 
How to write a program in any programming language
How to write a program in any programming languageHow to write a program in any programming language
How to write a program in any programming language
Rakesh Kumar R
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
What is Master Data Management by PiLog Group
What is Master Data Management by PiLog GroupWhat is Master Data Management by PiLog Group
What is Master Data Management by PiLog Group
aymanquadri279
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
Sven Peters
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
Remote DBA Services
 
Energy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina JonuziEnergy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina Jonuzi
Green Software Development
 
Revolutionizing Visual Effects Mastering AI Face Swaps.pdf
Revolutionizing Visual Effects Mastering AI Face Swaps.pdfRevolutionizing Visual Effects Mastering AI Face Swaps.pdf
Revolutionizing Visual Effects Mastering AI Face Swaps.pdf
Undress Baby
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
lorraineandreiamcidl
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
Octavian Nadolu
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 

Recently uploaded (20)

What is Augmented Reality Image Tracking
What is Augmented Reality Image TrackingWhat is Augmented Reality Image Tracking
What is Augmented Reality Image Tracking
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptxLORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
 
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise EditionWhy Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
 
How to write a program in any programming language
How to write a program in any programming languageHow to write a program in any programming language
How to write a program in any programming language
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
What is Master Data Management by PiLog Group
What is Master Data Management by PiLog GroupWhat is Master Data Management by PiLog Group
What is Master Data Management by PiLog Group
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
 
Energy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina JonuziEnergy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina Jonuzi
 
Revolutionizing Visual Effects Mastering AI Face Swaps.pdf
Revolutionizing Visual Effects Mastering AI Face Swaps.pdfRevolutionizing Visual Effects Mastering AI Face Swaps.pdf
Revolutionizing Visual Effects Mastering AI Face Swaps.pdf
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 

Building Efficient and Reliable Crawler System With Sidekiq Enterprise