[@IndeedEng] How to Get a Job 35 Million Times a Day Using RabbitMQ

  • 5,377 views
Uploaded on

@IndeedEnd March: Wednesday, March 27th …

@IndeedEnd March: Wednesday, March 27th
Video available: http://www.youtube.com/watch?v=MeRHetCMiHg

The goal of Indeed's aggregation engine is to find and retrieve every job in the world, as quickly and accurately as possible. As we described in our previous tech talk, we strive to build products that are simple, fast, comprehensive, and relevant. The world's most comprehensive job search site is fueled by the more than 35 million job postings we process every day, which we deliver to jobseekers within minutes of discovery.

Our original aggregation architecture was implemented using standard patterns. Our growth required levels of scalability, performance, and resilience this architecture simply could not handle. In a case study of scaling for the web, we will discuss how we tackled this problem. We will cover the issues we saw with our original architecture, how we analyzed our options to guide a solution, how we used RabbitMQ as a key component in the new architecture, and benchmarks to evaluate how successful we were.

Speaker Ketan Gangatirkar is the development manager responsible for Indeed's continuous deployment infrastructure as well as its aggregation system.

Speaker Cameron Davison is a software engineer on the aggregation team at Indeed and a graduate of UT Austin. He re-architected Indeed's aggregation pipeline using RabbitMQ to sustain high write volumes, and continues to improve products in the aggregation system to make it run more efficiently.

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
5,377
On Slideshare
0
From Embeds
0
Number of Embeds
12

Actions

Shares
Downloads
64
Comments
0
Likes
12

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. How to Get a Job 35 Million Times a DayUsing RabbitMQKetan Gangatirkar and Cameron Davison
  • 2. One search. All jobs.
  • 3. Aggregation gets jobs
  • 4. Aggregation gets jobs soJobseekers get jobs
  • 5. Aggregation != SpideringSpiders see pages.Aggregation sees jobs.
  • 6. How spiders see job sitesPagePagePagePagePagePagePagePagePagePagePagePagePagePagePagePagePagePage
  • 7. How Indeed sees job sitesStartJob ListJob Job JobJob ListJob Job JobJob ListJob Job JobNavigation NavigationJobJobJob
  • 8. Aggregation != SpideringJob sites have structureJob pages have semanticsNavigation is more than following links
  • 9. Rememberthis
  • 10. Aggeveryjob
  • 11. {Url: http://www.applytracking.com/track.aspx/3VYzRTitle: Senior Erlang EngineerCompany: Machine ZoneLocation: Palo Alto,CA,US, 94301Source Type: EmployerJob Type: Full-time...Description: The Senior Erlang Engineer is an integral ......Createdate: 2013-02-05 23:18:05...}Whats in a job
  • 12. locationdescriptionCompanyTitle
  • 13. Titlesalarylocationjob typedescriptionCompany
  • 14. How we build productssimplefastcomprehensiverelevant
  • 15. SimpleTough problems, simple solutions
  • 16. FastDiscover the jobs quicklyGet them to jobseekers in minutes
  • 17. 10% of jobseekers sort by date
  • 18. Do you want only new jobs?
  • 19. 20% of jobseekers want only new jobs
  • 20. Daily new job emails
  • 21. Speed matters
  • 22. ComprehensiveGet every job
  • 23. RelevantSemantic extractionThe job is still availableIgnore non-jobs
  • 24. This is a hard problemFlaky sitesSite redesignsJavascriptMissing or bad information
  • 25. Big N makes it even harderExamine 38M jobs every day
  • 26. Do this in minutesSearch100MJobseekersAggregationEmployersJob BoardsStaffing firmsRecruiters
  • 27. Strawman* architectureDatacenter BMySQLEngineDatacenter AJob siteEngineJob siteEngineJob siteEngineJob siteEngineJob siteEngineJob sitePrimaryDatacenter
  • 28. Limitations
  • 29. N connectionsMySQLJob siteJob siteJob siteJob siteJob siteJob sitePrimaryDatacenterEngineEngineEngineEngineEngineEngineDatacenter BDatacenter A
  • 30. N concurrent writersMySQLJob siteJob siteJob siteJob siteJob siteJob sitePrimaryDatacenterEngineEngineEngineEngineEngineEngineDatacenter BDatacenter A
  • 31. High latencyMySQLJob siteJob siteJob siteJob siteJob siteJob sitePrimaryDatacenterEngineEngineEngineEngineEngineEngineDatacenter BDatacenter A
  • 32. Limitation: failure pointsDatacenter BMySQLEngineDatacenter AJob siteEngineJob siteEngineJob siteEngineJob siteEngineJob siteEngineJob sitePrimaryDatacenterXX
  • 33. Scaling PatternsWhat has worked for us so far?
  • 34. Service-Oriented ArchitectureEngineEngineEngineJob WriteService MySQLRemoteDatacenterPrimaryDatacentersee http://go.indeed.com/boxcar
  • 35. Standard Service InteractionClient Service Database
  • 36. Our InteractionClient Service Database
  • 37. Does this do what we need?● Lots of workers...● Sending lots of results...● Over a long distance...● That need to get processed fast...● Reliably?
  • 38. Engine FailureEngineEngineEngineJob WriteService MySQLRemoteDatacenterXPrimaryDatacenter
  • 39. Engine failure fix:Buffer to diskEngineEngineEngineJob WriteService MySQLRemoteDatacenterdiskdiskdiskPrimaryDatacenterX
  • 40. Network FailureEngineEngineEngineJob WriteService MySQLRemoteDatacenterXPrimaryDatacenter
  • 41. Network failure fix:Disks solve that tooEngineEngineEngineJob WriteService MySQLRemoteDatacenterdiskdiskdiskXPrimaryDatacenter
  • 42. Write Service FailureJob WriteService MySQLRemoteDatacenterXEngineEngineEnginePrimaryDatacenter
  • 43. Write Service Failure fix:Disks solve that tooJob WriteService MySQLRemoteDatacenterXEngineEngineEnginePrimaryDatacenterdiskdiskdisk
  • 44. Write Service Failure fix:RedundancyJob WriteServiceMySQLRemoteDatacenterPrimaryDatacenterXEngineEngineEngineJob WriteServiceJob WriteService
  • 45. Database FailureJob WriteService MySQLRemoteDatacenterXEngineEngineEnginePrimaryDatacenter
  • 46. Database Failure fix:Buffer to diskJob WriteServiceMySQLRemoteDatacenterXEngineEngineEnginediskPrimaryDatacenter
  • 47. Our new architectureJob WriteServiceMySQLRemoteDatacenterPrimaryDatacenterEngineEngineEnginediskdiskdiskJob WriteServiceJob WriteServicediskdiskdisk
  • 48. We could build this...Job WriteServiceMySQLRemoteDatacenterPrimaryDatacenterEngineEngineEnginediskdiskdiskJob WriteServiceJob WriteServicediskdiskdisk
  • 49. ... maybe someone already hasJob WriteServiceMySQLRemoteDatacenterPrimaryDatacenterEngineEngineEnginediskdiskdiskJob WriteServiceJob WriteServicediskdiskdisk
  • 50. We should use a message queue
  • 51. Cameron Davison
  • 52. Aggregation Requirements● Durable● Multi-Data Center (latency)● 38 million jobs a day● 2KB average job size○ 76 GB a day● Target peaks of 1000 jobs / second● Programming language agnostic
  • 53. Selection
  • 54. What we foundHigh AvailabilityOpen Source/FreeSelf-hostedPerformant
  • 55. Out-of-the-box Experience
  • 56. Advanced Message QueuingProtocol (AMQP)● Open Standard● Wire protocol● Existing Clients in Multiple Languages
  • 57. Concepts● Confirmation and Ack● At least once● Asynchronous Confirms● Persistent● Clustering
  • 58. Confirmation and AckMQProducer Consumermsgconfirmackmsg12 34
  • 59. At least onceMQAt most onceConsumerMessageAckMQConsumerMessageAuto Ack
  • 60. Asynchronous Confirms12345678910111213141516Producermessagesconfirm #6
  • 61. PersistentMQProducer Consumer
  • 62. PersistentMQProducer Consumer
  • 63. PersistentMQProducer ConsumerX
  • 64. PersistentMQProducer Consumer
  • 65. PersistentMQProducer Consumer
  • 66. ClusteringSlaveMasterProducer1234
  • 67. Testing
  • 68. Test RabbitMQ● Send millions of 2KB messages● 20 producers and 20 consumers● 1000 messages / second● Simulate multiple failures
  • 69. Test ConsistencyProducersRabbitMQRabbitMQConsumersSlaveMaster
  • 70. Test ConsistencyProducersRabbitMQRabbitMQConsumersMasterSlave
  • 71. Test ConsistencyProducersRabbitMQRabbitMQConsumersMasterSlave
  • 72. Test ConsistencyProducersRabbitMQRabbitMQConsumersXMaster
  • 73. Test ConsistencyProducersRabbitMQRabbitMQConsumersXMaster
  • 74. Test ConsistencyProducersRabbitMQRabbitMQConsumersMasterSlave
  • 75. RabbitMQ ClusteringMaster Slave
  • 76. RabbitMQ ClusteringMaster Slave
  • 77. RabbitMQ ClusteringMasterX
  • 78. RabbitMQ ClusteringMasterX
  • 79. RabbitMQ ClusteringMasterSlave
  • 80. RabbitMQ ClusteringMasterSlave
  • 81. RabbitMQ ClusteringMasterSlave
  • 82. RabbitMQ ClusteringMasterSlave
  • 83. RabbitMQ ClusteringMasterSlave
  • 84. RabbitMQ ClusteringMasterSlave
  • 85. RabbitMQ ClusteringMasterX
  • 86. RabbitMQ ClusteringMasterSlave
  • 87. RabbitMQ ClusteringMasterSlave
  • 88. RabbitMQ ClusteringMasterSlave
  • 89. RabbitMQ ClusteringMasterX
  • 90. RabbitMQ ClusteringMasterXX
  • 91. RabbitMQ ClusteringMasterX
  • 92. RabbitMQ ClusteringMasterX
  • 93. RabbitMQ ClusteringMaster Slave
  • 94. Non-persistent15990 Messages / Second30 MB/s
  • 95. Persistent2781 Message / Second5.5 MB/s
  • 96. Clustered and Persistent1262 Message / Second2.5 MB/s
  • 97. Applying RabbitMQ
  • 98. Unreliable High LatencyConnectionsEngineEngineEngineJob WriteServiceRemote DC Primary DCMySQL
  • 99. Replaced with RabbitMQEngineEngineEngineJob WriteServiceRabbitMQRemote DC Primary DCMySQL
  • 100. Replaced with RabbitMQEngineEngineEngineJob WriteServiceRabbitMQRemote DC Primary DC
  • 101. Replaced with RabbitMQEngineEngineEngineJob WriteServiceRabbitMQRemote DC Primary DC
  • 102. Replaced with RabbitMQEngineEngineEngineJob WriteServiceRemote DC Primary DCRabbitMQ
  • 103. Replaced with RabbitMQEngineEngineEngineJob WriteServiceRemote DC Primary DCRabbitMQ
  • 104. Rabbit can talk to RabbitShovel PluginProducer RabbitMQ 1 ConsumerRabbitMQ 2
  • 105. Replaced with RabbitMQEngineEngineEngineJob WriteServiceRabbitMQRabbitMQRabbitMQRabbitMQRemote DC Primary DC
  • 106. Replaced with RabbitMQEngineEngineEngineJob WriteServiceRabbitMQRabbitMQRabbitMQRabbitMQPrimary DCRabbitMQRemote DC
  • 107. ParallelizeJob Write ServiceRabbitMQJob WriteServiceJob WriteServiceJob WriteServiceJob AJob BJob C
  • 108. Replaced with RabbitMQEngineEngineEngineJob WriteServiceRabbitMQRabbitMQRabbitMQRabbitMQPrimary DCRabbitMQJob WriteServiceRemote DC
  • 109. Replaced with RabbitMQEngineEngineEngineJob WriteServiceRabbitMQRabbitMQRabbitMQRabbitMQPrimary DCRabbitMQJob WriteService
  • 110. Message Flow
  • 111. Message FlowEngineEngineEngineJob WriteServicePrimary DCJob WriteServiceRabbitMQRabbitMQRabbitMQRabbitMQRabbitMQ
  • 112. Message FlowEngineEngineEngineJob WriteServicePrimary DCJob WriteServiceRabbitMQRabbitMQRabbitMQRabbitMQRabbitMQ
  • 113. Message FlowEngineEngineEngineJob WriteServicePrimary DCJob WriteServiceRabbitMQRabbitMQRabbitMQRabbitMQRabbitMQ
  • 114. Message FlowEngineEngineEngineJob WriteServicePrimary DCJob WriteServiceRabbitMQRabbitMQRabbitMQRabbitMQRabbitMQ
  • 115. Jobs/minute
  • 116. Jobs/minute from one site220,000 jobs6 hours611 jobs / minute
  • 117. Jobs/minute from one site251,000 jobs20 minutes12550 jobs / minute
  • 118. RabbitMQHorizontal ScaleEngineEngineEngine Job WriteServiceRabbitMQJob WriteServiceRabbitMQRabbitMQRabbitMQRabbitMQRabbitMQJob WriteServiceJob WriteService
  • 119. Horizontal Scale
  • 120. Horizontal Scale
  • 121. Today 1000 messages / second
  • 122. RabbitMQ 32486 Message / Second5MB/s
  • 123. RabbitMQ Configuration● Confirmations - Fire and Forget● Persistent Messages - Durable● Shoveling - Multi-Data Center● Mirrored Queues in Cluster - High Reliability
  • 124. Can we do more with RabbitMQ?
  • 125. Aggregation ViewerReal-time browser-based view of job stream
  • 126. ● Almost real-time● Exclusive queue● Transient messagesAggregation Viewer ArchitectureAgg JobsRabbit MQClusterAgg ViewerRabbit MQAggViewerShovel* SubscribeJobs HTTP Browser
  • 127. Resume Contacts BillingPay-per-contact: limited budget
  • 128. Resume Contacts BillingOriginal PathPacificAsia DC US DCLog repoResume SearchMySQLsee http://go.indeed.com/logrepo
  • 129. Resume Contacts BillingFast PathPacificAsia DC US DCRabbitMQMySQLLog repoRabbitMQResume SearchX
  • 130. Company Page EditsUser-contributed content about companies
  • 131. Company Page
  • 132. Company Page EditsImplementationWriting data AND reading it back
  • 133. Company Page EditsSingle DatacenterBrowserWeb ServerMySQL
  • 134. Company Page ServingBrowserWeb ServerLSM TreeAsia DatacenterMemcachedsee http://go.indeed.com/lsmtree
  • 135. PacificCompany Page EditsBrowserWeb ServerRabbitMQ RabbitMQ MySQLPrimary USDatacenterAsia Datacenter EU DatacenterAtlantic[Et cetera]Memcached
  • 136. PacificCompany Page ReadsMySQLLSM TreeBuilderLSM TreePrimary USDatacenterAsia DatacenterLSM TreeEU DatacenterAtlantic[Et cetera]
  • 137. MemcachedPacificCompany Pages SystemBrowserWeb ServerRabbitMQ RabbitMQ MySQLLSM TreeBuilderLSM TreePrimary USDatacenterAsia DatacenterLSM TreeEU DatacenterAtlantic[Et cetera]
  • 138. Other applications
  • 139. Company Pages
  • 140. Recap: The jobs must flow● Durability● High throughput● Low latency● Partition-tolerance● Efficient use of the database● Minimal points of failure