Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

FetchQ - the origins

7 views

Published on

This is a story of struggle and friendship. We set to download the entire Instagram and came pretty close to achieve our goal. Along the way, we developed a fast and reliable stateful queue system based on Postgres and today you can easily integrate task delegation in your software architecture.

https://youtu.be/g8P_w5dyW3c

Published in: Software
  • Be the first to comment

  • Be the first to like this

FetchQ - the origins

  1. 1. can you download the whole Instagram? A bold tale of struggle and friendship
  2. 2. Agenda ➔ start a demo ➔ understand the challenge ➔ integrate people ➔ HOW WE DID IT ➔ see demo results
  3. 3. CEO COO Prod. owner Developer CTO BusinessAwareness THE COMPANY GAP Technical Awareness
  4. 4. making instagram profiles SEARCHABLE
  5. 5. THE NEW GUY
  6. 6. CEO COO Prod. owner Developer CTO BusinessAwareness THE COMPANY GAP Technical Awareness
  7. 7. Getting into engineering LevelofBlackMagic Technical skills
  8. 8. Level zero in dev. Study & Google copy/paste
 npm install NEED TO REDUCE POINTS
  9. 9. Getting into engineering Technical skills New guy level LevelofBlackMagic
  10. 10. Gained more knowledge Introduced to “THE BACKEND” First full-stack feature
  11. 11. A NEW HOPE • Look at competitors data • Find public API
 (json bases)
  12. 12. What is a queue?
  13. 13. REQUIREMENTS ● PARALLEL JOBS ● FAULTS TOLERANCE ● UNIQUE DOCUMENTS ● RESCHEDULE DOCUMENTS ● AVOID RE-INSERT ● COST EFFICIENT ● JAVASCRIPT
  14. 14. CREATE TABLE tasks ( subject CHARACTER … PRIMARY KEY, next_iteration TIMESTAMP, attempts INTEGER, payload JSONB );
  15. 15. DISCOVERY PROFILE
 TRACKER PROFILE
 BUILDER
  16. 16. LEARN
  17. 17. Dataprocessing Knowledge 300k 100M+
  18. 18. 1 WORKERS ~ 10 profiles / minute CHALLENGE 2 WORKERS ~ 13 profiles / minute 3 WORKERS ~ 14 profiles / minute
  19. 19. SELECT subject FROM tasks WHERE next_iteration <= NOW() LIMIT 1; UPDATE tasks SET next_iteration = NOW() + interval ‘5m’ WHERE subject = ‘xxx’;
  20. 20. SELECT subject FROM tasks WHERE next_iteration <= NOW() LIMIT 1; UPDATE tasks SET next_iteration = NOW() + interval ‘5m’ WHERE subject = ‘xxx’; WORKERS GAP
  21. 21. UPDATE tasks SET next_iteration = NOW() + interval ‘5m’ WHERE subject in ( SELECT subject FROM tasks WHERE next_iteration <= NOW() LIMIT 1 FOR UPDATE SKIP LOCKED ) RETURNING subject;
  22. 22. Dataprocessing Knowledge 300k 1M
  23. 23. 100% CPU CREDITS LIMIT CHALLENGE 100% DISK I/O LIMIT ~200 $/month “m4.large” machine
  24. 24. INSERT INTO tasks (subject, next_iteration) VALUES (`xxx`, NOW() ), (`yyy`, NOW() + INTERVAL ‘5m’ ), (`zzz`, NOW() - INTERVAL ‘1y’ );
  25. 25. for (const task in stuff_to_insert) { const exists = await query(` SELECT subject FROM tasks WHERE subject = '`${task}`' LIMIT 1 `); if (!exists) { await query("INSERT INTO ...") } }
  26. 26. INSERT INTO tasks (subject, next_iteration) VALUES (`xxx`, NOW() ), (`yyy`, NOW() + INTERVAL ‘5m’ ), (`zzz`, NOW() - INTERVAL ‘1y’ ) ON CONFLICT DO NOTHING;
  27. 27. Dataprocessing Knowledge 300k 1M 5M
  28. 28. 40s+ to pick a document! CHALLENGE
  29. 29. SELECT FROM tasks WHERE next_iteration < NOW() AND attempts < 5 ORDER BY next_iteration ASC, attempts ASC LIMIT 1;
  30. 30. CREATE TABLE tasks ( subject CHARACTER … PRIMARY KEY, next_iteration TIMESTAMP, status INTEGER, attempts INTEGER, payload JSONB );
  31. 31. SELECT FROM tasks WHERE status = 1 AND attempts < 5 ORDER BY next_iteration ASC, attempts ASC LIMIT 1;
  32. 32. CREATE INDEX pending ON tasks( next_iteration ASC, attempts ASC ) WHERE ( status = 1 AND attempts < 5 );
  33. 33. Need to update STATUS CHALLENGE
  34. 34. UPDATE tasks SET status = 1 WHERE subject IN ( SELECT subject FROM tasks WHERE status = 0 AND next_iteration < NOW() FOR UPDATE SKIP LOCKED );
  35. 35. Dataprocessing Knowledge 300k 1M 20M 5M
  36. 36. SCALE
  37. 37. Scaling in AWS ➔ EC2 Clicking ➔ Auto Scaling Groups ➔ CloudFormation
  38. 38. Speed differences among queues Errors (thousands of them!) CHALLENGE
  39. 39. WITH BIG DATA COMES BIG TROUBLES true story
  40. 40. MEET GRAFANA
  41. 41. What to measure?
  42. 42. # How much cumulated work? SELECT COUNT (*) FROM tasks WHERE status = 1; # How much planned word? SELECT COUNT (*) FROM tasks WHERE status = 0; # Other metrics…
  43. 43. CREATE TABLE metrics ( metric CHARACTER … NOT NULL, value INTEGER ); CREATE TABLE metrics_logs ( ctime TIMESTAMP, metric CHARACTER … NOT NULL, increment INTEGER );
  44. 44. INSERT INTO metrics_logs ( ctime, metric, increment ) VALUES ( NOW(), `pending`, 1 );
  45. 45. FUNCTIONS
  46. 46. FOR VAR_r IN SELECT DISTINCT ON (metric), metric, increment FROM metrics_logs ORDER BY ctime ASC LOOP … END LOOP;
  47. 47. SELECT SUM(increment) INTO VAR_sum FROM metrics_logs WHERE metric = VAR_r.metric; UPDATE metrics AS t SET value = t.value + VAR_sum WHERE metric = VAR_r.metric;
  48. 48. Dataprocessing Knowledge 300k 1M 20M 100M+ 5M
  49. 49. TAKEAWAY ➔ With good communication comes good results ➔ Keep it simple ➔ No limits for new knowledge ➔ Mindset, curiosity, improvements
  50. 50. CEO COO Prod. owner Developer CTO BusinessAwareness THE COMPANY GAP Technical Awareness

×