Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Punch clock for debugging apache storm

294 views

Published on

Motivation:

To find out….

When did the batch enter/exit the Spout/Bolt ?

Which batch is still in the Spout/Bolt? i.e. are any batches STUCK ?

On which host are they stuck ?

In which Spout/Bolt are they stuck ?

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Punch clock for debugging apache storm

  1. 1. Punch clock for Apache storm <just an idea>
  2. 2. Punch clock (a.ka. time clock)
  3. 3. Punch clock (a.ka. time clock) ● You have a card per person.
  4. 4. Punch clock (a.ka. time clock) ● You have a card per person. ● The person punches IN with the card when he/she enters the office.
  5. 5. Punch clock (a.ka. time clock) ● You have a card per person. ● The person punches IN with the card when he/she enters the office. ● The person punches OUT with the card when he/she leaves the office.
  6. 6. Punch clock (a.ka. time clock) ● You have a card per person. ● The person punches IN with the card when he/she enters the office. ● The person punches OUT with the card when he/she leaves the office. ● The punch clock records the time of entry/exit on the card
  7. 7. Motivation To Find out …
  8. 8. Motivation To Find out … 1. When did the Person enter / exit the office ?
  9. 9. Motivation To Find out … 1. When did the Person enter / exit the office ? 2. Who is still in office ?
  10. 10. Change of Context …
  11. 11. “Apache Storm” Tuples going In & Out of Spouts/Bolts
  12. 12. Motivation Debugging Apache Storm* * Debugging Storm Transactional Topologies
  13. 13. Debugging Transactional Topologies
  14. 14. Debugging Transactional Topologies 1. Spout emits a batch of data(tuples) which forms a transaction.
  15. 15. Debugging Transactional Topologies 1. Spout emits a batch of data(tuples) which forms a transaction. 2. Every Bolt in the topology processes that batch of data (tuples).
  16. 16. Motivation To Find out …
  17. 17. Motivation To Find out … 1. When did the batch enter/exit the Spout/Bolt ?
  18. 18. Motivation To Find out … 1. When did the batch enter/exit the Spout/Bolt ? 2. Which batch is still in the Spout/Bolt? i.e. are any batches STUCK ?
  19. 19. Motivation To Find out … 1. When did the batch enter/exit the Spout/Bolt ? 2. Which batch is still in the Spout/Bolt? i.e. are any batches STUCK ? a. On which host are they stuck ? b. In which Spout/Bolt are they stuck ?
  20. 20. Possible Solution(s):
  21. 21. Possible Solution(s): Add a log statement before and after the critical section.
  22. 22. Possible Solution(s): Add a log statement before and after the critical section. log.info(“Inserting data into database ….”); // ← entering datasource.insert(table, tuples); // ←the real work log.info(“Inserted data into database.”); //← exiting
  23. 23. Possible Solution(s): Add a log statement before and after the critical section. log.info(“Inserting data into database ….”); // ← entering datasource.insert(table, tuples); // ←the real work log.info(“Inserted data into database.”); //← exiting ------------------------------------------------------------------ Cons: Logs distributed over multiple hosts, need to aggregate logs. needs a bit of work, Elastic Search Kibana ?
  24. 24. Possible Solution(s): Use http://riemann.io/index.html This was Suggested by my friend angad. I have not looked at this though.
  25. 25. My Idea Batch of Tuples Punch IN and Punch Out in a bolt / spout.
  26. 26. My Idea Batch of Tuples Punch IN and Punch Out in a bolt / spout. Punch In - Put into hashmap (or any other suitable data structure) Punch Out - Remove from hashmap (or any other suitable data structure)
  27. 27. My Idea: Batch of Tuples Punch In and Punch Out in a spout. In the emitBatch of Transactional Spout: PunchClock.getInstance().punchIn(punchCardId); // ←Punch In collector.emit(tuples); // ←Emit tuple(s) PunchClock.getInstance().punchOut(punchCardId); // ←Punch Out
  28. 28. Batch of Tuples Punch IN and Punch Out in a bolt . In the prepare method of Transactional Bolt: punchCardId ="Bolt__"+Thread.currentThread().getId()+"__"+System.currentTimeMillis(); // ←Create Punch Card for txn In the execute method of Transactional Bolt: PunchClock.getInstance().punchIn(punchCardId); // ← Punch In In the finishBatch method of Transactional Bolt: PunchClock.getInstance().punchOut(punchCardId); // ← Punch Out My Idea:
  29. 29. Yes, but it’s a simple Put / Remove call to a hashmap. When compared to logging it’s cheaper Is it intrusive ?
  30. 30. Punch Clocks
  31. 31. Punch Clocks ● Spouts / Bolts housed in a storm worker jvm.
  32. 32. Punch Clocks ● Spouts / Bolts housed in a storm worker jvm. ● One Punch Clock per JVM.
  33. 33. Punch Clocks ● Spouts / Bolts housed in a storm worker jvm. ● One Punch Clock per JVM. ● Since we have multiple JVM we have multiple Punch Clocks.
  34. 34. Punch Clocks ● Spouts / Bolts housed in a storm worker jvm. ● One Punch Clock per JVM. ● Since we have multiple JVM we have multiple Punch Clocks. ● Batches move across storm workers & we have multiple JVM, ○ We need to aggregate the data across Punch Clocks. ○ Expose Punch Clock via JMX.
  35. 35. demo:
  36. 36. thank you jaihind213@gmail.com https://github.com/jaihind213/storm-punch-clock sweetweet213@twitter

×