How could I automate log gathering in the distributed system


Published on

I've attended Korea Perl Workshop2012.
This is my announcement.

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

How could I automate log gathering in the distributed system

  1. 1. How could I automate log gatheringin the distributed system using Perl? Some system programmer’s survival(!) story -When developed Ethernet/IP modules in EPC Core system.
  2. 2. Background of life(?) story
  3. 3. A long time ago… There was a S/W developer( image : )
  4. 4. Happy Life He worked in the ‘S’ company Actually, ‘S’ company has heavy workloads Coincidently, the team, which includes him, had many free time Image edu/150now/15 0visitors.htm
  5. 5. Then one Day,… He got a transfer to the EPC Core system develop teamImage
  6. 6. EPC? Evolved Packet Core It is core system for LTE service
  7. 7. Would U want know EPC/Lte? But, It is out of scope in this seminar
  8. 8. More over… I didn’t know detailed EPC/Lte technology I still don’t know EPC/Lte technology ost-my-wallet.html
  9. 9. My Duty Developed network device drivers ◦ Ex) Switch Modify network layer in Linux kernel L2/L3 protocols handling
  10. 10. Beginning of Hardship
  11. 11. First Challenge  The EPC system is first challenge in the company ◦ If you are developer, you will understand it’s meaning fully
  12. 12. Problem of human resource A lot of hundreds engineers are involved in this project. Overall, it seems to be not bed
  13. 13. But,… OS/DD team was consist of 3 senior engineers and one newbie ◦ Specially, Device drivers & L2/L3 protocols  Only One Guy
  14. 14. YES!! and It was me! Image ‘Home Alone’
  15. 15. Lack of useful tools When early develop stage, useful tools were not ready 90_lockout_tool_box_no_lockout_devices_included
  16. 16. Basically Network Core System is huge, complex and difficult One more see !!
  17. 17. And (Just in my feeling) It was horrible & heavy work
  18. 18. Anyway I solved many difficult problems ◦ I survived finally
  19. 19. How Perl helped me fromannoying work
  20. 20. Firstly  We should get basic understanding about system architecture
  21. 21. Please, Don’t sleep
  22. 22. Importance in EPC system Each services are distributed Must provide fail-over & none stop service ◦ HA(High Availability)
  23. 23. The basic composition of system Management boards ◦ Master & secondary master board  If master board is failed, secondary master take management role quickly(H.A) Each service boards ◦ Variety call/protocol service Other service connections ◦ Ex)AAA  All boards are connected with gigabit ethernet.“I can’t tell detailed & exact contents because of security reason”
  24. 24. Shape of physical system* This is just reference forsystem image
  25. 25. Let’s Imagine!! In this architecture,
  26. 26. If some problem is occurred,How to debug it?
  27. 27. Variety reason – S/W • Which Layer? • Application layer • Protocols(network layer) • Device Driver • Kernel • Which slot(board)?
  28. 28. Variety reason - configuration Mistyping ◦ Ex)Illegal number Just mistake ◦ Someone changed physical configuration without notice when some batch work is processed Application problem ◦ Shell, reporter, statistics Apps Misconfiguration ◦ Tester’s misunderstanding for network/service
  29. 29. Variety reason – H/W Trouble board Trouble chip Trouble cable Trouble chassis Other ◦ Ex)Electric damage
  30. 30. We can imagine some picturein this situation
  31. 31. How to clarify it?
  32. 32. Show me the LOG!! Variety status information, error/warring messages, some dump and blabla… ◦ These are stored in the system as log file form
  33. 33. When finished stage… Many utilities and shell commands are provided
  34. 34. But, the early days, Collect variety logs from each board manually
  35. 35. More Limitation Per chassis, only management boards have public IP address and connected to external network Other boards have just private IP address and it is connected from M.G board only
  36. 36. Limitation(cont.) User only could login to service boards from M.G board
  37. 37. Sometimes I should directly execute some debugging tool to get specific register values on the each board ◦ Ex) PHY, Switch, etc. For Switch ASIC, ◦ It has huge registers set and complexity
  38. 38. That job.. It was very troublesome Needed a lot of time
  39. 39. More sad story If some hang-up or service fail is occurred,
  40. 40. OS/DD team had to clarify it firstly Yes, I was involved this team Yes, only 3+1 humans
  41. 41. How to automatically Login to each board Find & check files Transfer log files Check change of system Execute external command or application then get result from it Extract some data from log files Etc.
  42. 42. We already know it’s answer audries-at- hoolRules.asp
  43. 43. Perl
  44. 44. ANDComprehensive Perl Archive Network
  45. 45. CPAN Is +%EC%83%81%EC%9E%90/000039689131
  46. 46. There are many useful modules in CPAN Net::Telnet Net::SSH Net::Ping Net::FTP Net::SFTP Blabla::Bla
  47. 47. But I want to … Integrate all these Execute external commands/tools interactively Fix some little issues for the CPAN module ◦ Some modules had bug or weakness  Ex) Ping module had ICMP bug ◦ Some feature was not implemented
  48. 48. Yes! I found Expect ◦
  49. 49. Expect!! Expect is TCL based application ◦ I don’t want to learn Tcl language Expect module is perl port
  50. 50. Simple Usage Load module Run external application Control timeout Detect prompt/result with pattern Execute command
  51. 51. Simple Usage(cont.)use Expect;# ==========================# prepare something# ==========================my $Agent = Expect->new( $externlApp, $params ) or die “blabla” ;$Agent->expect( $timeout, $some_pattern);$Agent->send($some_command);# ========================# do something more# =========================$Agent->expect($timeout, $some_pattern4prompt);$Agent->send($exit_command);$Agent->soft_close();
  52. 52. Sorry!!Now I don’t have this codeSo I can’t show it say-sorry-with.html
  53. 53. Instead,I’ll show full shot about it
  54. 54. Chassis 0 Chassis 1 Slot Slot Slot Slot #0 #n #0 #n IP tableLog aaa Log bbb Arp Device info B Device info A System start time
  55. 55. Cost All modules are free I just consumed 2 hours to write codes ◦ considering all exceptional cases ◦ looking for patterns about login prompts and result of external Apps ◦ include testing & debugging time
  56. 56. Benefit I needed 15~20 Min to get all logs from all boards  just few seconds in regular case  this was often work
  57. 57. Benefit Execute batch process every night ◦ We tested new service or release s/w in every night ◦ My this solution was used in few days ◦ Before long, other reporting tool was prepared
  58. 58. Thanks Perl Perl had helped me to save my life from many dirty & annoying works