How could I automate log gathering in the distributed system

2,852 views

Published on

I've attended Korea Perl Workshop2012.
This is my announcement.

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,852
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

How could I automate log gathering in the distributed system

  1. 1. How could I automate log gatheringin the distributed system using Perl? Some system programmer’s survival(!) story -When developed Ethernet/IP modules in EPC Core system.
  2. 2. Background of life(?) story
  3. 3. A long time ago… There was a S/W developer( image : http://www.blogsolute.com/9-things-that-shows-you-are-still-a-rookie-in-blogosphere/4037/ )
  4. 4. Happy Life He worked in the ‘S’ company Actually, ‘S’ company has heavy workloads Coincidently, the team, which includes him, had many free time Image http://photo2.si. edu/150now/15 0visitors.htm
  5. 5. Then one Day,… He got a transfer to the EPC Core system develop teamImage http://thetechtiger.blogspot.kr/2010/12/newbie-spirit.html
  6. 6. EPC? Evolved Packet Core It is core system for LTE service http://www.iphase.com/products/lte_about.cfm
  7. 7. Would U want know EPC/Lte? But, It is out of scope in this seminar
  8. 8. More over… I didn’t know detailed EPC/Lte technology I still don’t know EPC/Lte technology http://jarielsmith.blogspot.kr/2012/05/l ost-my-wallet.html
  9. 9. My Duty Developed network device drivers ◦ Ex) Switch Modify network layer in Linux kernel L2/L3 protocols handling
  10. 10. Beginning of Hardship
  11. 11. First Challenge  The EPC system is first challenge in the company ◦ If you are developer, you will understand it’s meaning fullyhttp://emenshealth.design.co.kr/in_magazine/sub.html?at=view&p_no=&info_id=45762&c_id=00010006
  12. 12. Problem of human resource A lot of hundreds engineers are involved in this project. Overall, it seems to be not bed
  13. 13. But,… OS/DD team was consist of 3 senior engineers and one newbie ◦ Specially, Device drivers & L2/L3 protocols  Only One Guy http://wkstudio.bigcartel.com/product/really-onesie
  14. 14. YES!! and It was me! Image ‘Home Alone’
  15. 15. Lack of useful tools When early develop stage, useful tools were not ready http://www.drillspot.com/products/106902/brady_worldwide_inc_652 90_lockout_tool_box_no_lockout_devices_included
  16. 16. Basically Network Core System is huge, complex and difficult One more see !! http://www.iphase.com/products/lte_about.cfm
  17. 17. And (Just in my feeling) It was horrible & heavy work
  18. 18. Anyway I solved many difficult problems ◦ I survived finally
  19. 19. How Perl helped me fromannoying work
  20. 20. Firstly  We should get basic understanding about system architecturehttp://depositphotos.com/5735004/stock-illustration-School-chalkboard.-Hand-Drawn-Design-Element.html
  21. 21. Please, Don’t sleep http://www.sleep-aid-center.com/how-to-take-a-power-nap-at-work/
  22. 22. Importance in EPC system Each services are distributed Must provide fail-over & none stop service ◦ HA(High Availability)
  23. 23. The basic composition of system Management boards ◦ Master & secondary master board  If master board is failed, secondary master take management role quickly(H.A) Each service boards ◦ Variety call/protocol service Other service connections ◦ Ex)AAA  All boards are connected with gigabit ethernet.“I can’t tell detailed & exact contents because of security reason”
  24. 24. Shape of physical system* This is just reference forsystem image http://www.compelgroup.net/english/10_06_advanced_tca_chassis.htm
  25. 25. Let’s Imagine!! In this architecture, http://www.cinema4d.co.kr/freeboard/901145
  26. 26. If some problem is occurred,How to debug it? http://www.wpclipart.com/computer/humour/debugging.png.html
  27. 27. Variety reason – S/W • Which Layer? • Application layer • Protocols(network layer) • Device Driver • Kernel • Which slot(board)? http://www.dicasemgeral.xpg.com.b
  28. 28. Variety reason - configuration Mistyping ◦ Ex)Illegal number Just mistake ◦ Someone changed physical configuration without notice when some batch work is processed Application problem ◦ Shell, reporter, statistics Apps Misconfiguration ◦ Tester’s misunderstanding for network/service
  29. 29. Variety reason – H/W Trouble board Trouble chip Trouble cable Trouble chassis Other ◦ Ex)Electric damage
  30. 30. We can imagine some picturein this situation
  31. 31. How to clarify it?
  32. 32. Show me the LOG!! Variety status information, error/warring messages, some dump and blabla… ◦ These are stored in the system as log file form
  33. 33. When finished stage… Many utilities and shell commands are provided http://berxblog.blogspot.kr/
  34. 34. But, the early days, Collect variety logs from each board manually http://blog.naver.com/PostList.nhn?blogId=alwkcjstk
  35. 35. More Limitation Per chassis, only management boards have public IP address and connected to external network Other boards have just private IP address and it is connected from M.G board only
  36. 36. Limitation(cont.) User only could login to service boards from M.G board http://www.doyletics.com/mrules.htm
  37. 37. Sometimes I should directly execute some debugging tool to get specific register values on the each board ◦ Ex) PHY, Switch, etc. For Switch ASIC, ◦ It has huge registers set and complexity
  38. 38. That job.. It was very troublesome Needed a lot of time http://www.nemopan.com/2650088
  39. 39. More sad story If some hang-up or service fail is occurred, http://www.bazaardesigns.com/8035-glossy-burning-fire-flame/
  40. 40. OS/DD team had to clarify it firstly Yes, I was involved this team Yes, only 3+1 humans
  41. 41. How to automatically Login to each board Find & check files Transfer log files Check change of system Execute external command or application then get result from it Extract some data from log files Etc.
  42. 42. We already know it’s answer http://fairfaxvillage.blogspot.kr http://www.st audries-at- ouse.co.uk/Sc hoolRules.asp
  43. 43. Perlhttp://www.clickindia.com/detail.php?id=9393605
  44. 44. ANDComprehensive Perl Archive Network http://www.cpan.org/
  45. 45. CPAN Is http://www.pixmac.kr/picture/%EB%B3%B4%EB%AC%BC +%EC%83%81%EC%9E%90/000039689131
  46. 46. There are many useful modules in CPAN Net::Telnet Net::SSH Net::Ping Net::FTP Net::SFTP Blabla::Bla
  47. 47. But I want to … Integrate all these Execute external commands/tools interactively Fix some little issues for the CPAN module ◦ Some modules had bug or weakness  Ex) Ping module had ICMP bug ◦ Some feature was not implemented
  48. 48. Yes! I found Expect ◦ http://search.cpan.org/~rgiersig/Expect-1.21/
  49. 49. Expect!! Expect is TCL based application ◦ I don’t want to learn Tcl language Expect module is perl port
  50. 50. Simple Usage Load module Run external application Control timeout Detect prompt/result with pattern Execute command
  51. 51. Simple Usage(cont.)use Expect;# ==========================# prepare something# ==========================my $Agent = Expect->new( $externlApp, $params ) or die “blabla” ;$Agent->expect( $timeout, $some_pattern);$Agent->send($some_command);# ========================# do something more# =========================$Agent->expect($timeout, $some_pattern4prompt);$Agent->send($exit_command);$Agent->soft_close();
  52. 52. Sorry!!Now I don’t have this codeSo I can’t show it http://best-messages.blogspot.kr/2010/12/best-sorry-sms-how-to- say-sorry-with.html
  53. 53. Instead,I’ll show full shot about it
  54. 54. Chassis 0 Chassis 1 Slot Slot Slot Slot #0 #n #0 #n IP tableLog aaa Log bbb Arp Device info B Device info A System start time
  55. 55. Cost All modules are free I just consumed 2 hours to write codes ◦ considering all exceptional cases ◦ looking for patterns about login prompts and result of external Apps ◦ include testing & debugging time
  56. 56. Benefit I needed 15~20 Min to get all logs from all boards  just few seconds in regular case  this was often work
  57. 57. Benefit Execute batch process every night ◦ We tested new service or release s/w in every night ◦ My this solution was used in few days ◦ Before long, other reporting tool was prepared
  58. 58. Thanks Perl Perl had helped me to save my life from many dirty & annoying works http://www.e-cute.net/super-happy-baby-with-a-super-happy-camel/

×