Linux_Boot_Process,Regular_Expressions,AWK_SED

  • 98 views
Uploaded on

When i was working in Hewlett-Packard i submitted this on the topics Linux Boot Process,Regular Expressions,AWK and SED

When i was working in Hewlett-Packard i submitted this on the topics Linux Boot Process,Regular Expressions,AWK and SED

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
98
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
10
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here Bharathkumarraju Dasararaju Date: 16/09/2013 Linux Boot Process,Regex AWK and SED
  • 2. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here Linux Boot Process 2 There are 6 stages of a typical Linux boot process  BIOS  MBR  GRUB/LILO  Kernel  Init  Runlevel Programs
  • 3. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here BIOS 3  Basic Input Output System is the lowest level interface to peripherals and controls the first step of the boot process.  BIOS performs some integrity checks and serches,loads, executes the bootable program. Its OS independent.  It checks for a bootable media in CD-ROM,HDD,USB to use to boot the system. you can press a key(F2,F12,F10 depends on BIOS settings on the system) during the BIOS start-up to change the boot sequence. Now a days network boot typically called PXE-boot got more familiar.  BIOS looks for Master Boot Record(MBR) which starting at the first sector on the first hard drive, loads its contents into memory, and then passes control to MBR.
  • 4. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here BIOS 4  The main Task BIOS performs is POST ( Power On Self Test) Its a quick test of memory , It does the basic check for fundamental hardware on your computer.  In side the BIOS list of devices that it will check in order to see if there is a MBR present on the particular device or not?  If BIOS didn’t found the MBR i.e. Bootable Program any of the listed devices in BIOS then BIOS fails to load the MBR throws an error like bootable media not found !!!
  • 5. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here Master Boot Record 5  MBR located in the 1st Sector of the bootable disk. Typically /dev/hda (or) /dev/sda which is the size of 512 bytes. [root@localhost grub]# ls –l stage1 -rw-r--r--. 1 root root 512 Jun 18 17:32 stage1  MBR Contains the instructions how to load the GRUB boot –loader and how to execute the GRUB as well. MBR typically contains the Partition table to boot over which describes the partitions on the disk and sets up the menu.  In the default Red Hat Linux configuration GRUB uses the settings in the MBR to display boot options in a menu which provides access to system. MBR automatically launches the bootloader.After boot loading there is a task takes place to create ramdisk . i.e. initrd
  • 6. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here GRand Unified Boot Loader(GRUB) 6  GRUB(LILO) Boot loader will then ask for the OS label which will identify which kernel to run and where it is located.  GRUB includes 3 stages Stage 1 : Typically includes MBR(512 bytes) Stage 1.5 : loads the FS drivers (ext3,ext4,FAT etc.) Stage 2 : Which loads the INIT.  GRUB displays the splash screen, waits for few seconds, if you don’t enter anything , it loads the default kernel image as specified in the grub configuration file.  Grub configuration file:/boot/grub/grub.conf(RHEL) [root@localhost ~]# ls -l /etc/grub.conf lrwxrwxrwx. 1 root root 22 Jun 18 17:32 /etc/grub.conf -> ../boot/grub/grub.conf
  • 7. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here GRand Unified Boot Loader(GRUB) 7 Default=0 timeout=20 splashimage=(hd0,0)/grub/splash.xpm.gz title Red Hat Enterprise Linux (2.6.32-279.el6.x86_64) root (hd0,0) kernel /vmlinuz-2.6.32-279.el6.x86_64 ro root=UUID=2d281341-e060- initrd /initramfs-2.6.32-279.el6.x86_64.img  Vmlinuz is the name of the Linux kernel executable.It is the first thing that is loaded into the memory when ever system boots up.  Initrd-initialise ram disk image which consists of the device drivers image(contains neccessary drivers compiled) and also initrd used for loading temporary root FS into the memory until kernel is booted and the real root FS is mounted.
  • 8. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here Kernel 8  Kernel mounts the root FS as specified in the “root=“ in grub.conf. The first thing the kernel does is to execute the /sbin/init program. The init program is responsible for creation of all subsequent processes .  Since the init was the 1st program to be executed by the Linux kernel, it has the process id of 1. Ex: ps –eaf | grep init  Loaded kernel checks memory ,checks and sets up the device drivers,organises the memory,sets up the FS. It also begins to spawn the processes from the init(PID=1)
  • 9. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here Kernel 9 Kernel only can have contact with all the devices  In a traditional RHEL the kernel comes like rpm package and it will be execute with the OS  Kernel typically contains device- drivers modules and memory, process related algorithms to process the requests.
  • 10. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here Init 10  Once kernel executes the /sbin/init then init process gets started. First Init looks at the /etc/inittab file to decide the Linux run level and then starts a getty program running at each terminal port.  Default runlevel. The runlevels used are: 0 - halt (Do NOT set initdefault to this) 1 - Single user mode 2 - Multiuser, without NFS (The same as 3, if you do not have networking) 3 - Full multiuser mode 4 - unused 5 - X11(Full Graphical features like display managers,plugins etc) 6 - reboot (Do NOT set initdefault to this)
  • 11. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here Init 11  In order to check default run level : [root@localhost ~]# who -r run-level 5 2013-07-22 12:19 [root@localhost ~]# cat /etc/inittab | grep -i "initdefault:" id:5:initdefault:  In ubunbtu and Linux mint this configuration file present under /etc/init/rc-sysinit.conf bharath-virtual-machine init # grep -i "env default_runlevel*" /etc/init/rc-sysinit.conf env DEFAULT_RUNLEVEL=5
  • 12. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here Runlevel Programs 12  When Linux system booting up you might see various services getting started as below. Those are the runlevel programs being executed at specific runlevel.  Depending on the default init level setting the system will execute the default programs
  • 13. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here Runlevel Programs 13  If the system starts up on the run level 5 then scripts under the /etc/rc5.d/ will executes scripts starts with ‘S’ for startup and scripts starts with ‘K’ for . Shutdown as below. lrwxrwxrwx. 1 root root 14 Jul 23 10:31 K74ntpd -> ../init.d/ntpd lrwxrwxrwx. 1 root root 15 Jul 23 10:33 S85httpd -> ../init.d/httpd  If you want your script should start automatically at boot time you have to add it in a /etc/rc5.d/ as a script starts with “s” Chkconfig utility in RHEL can facilitate with that automation facility.
  • 14. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here Linux Boot Process 14
  • 15. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here Linux Boot Process 15  once all the system process are stared up init starts up a i.e. Init spawns getty(/sbin/mingetty)program(process) running at each physical terminal(conneted to server via direct wire or modem) port .  Getty program displays login prompt waits for the user to enter the name.  As soon as user types the username followed by enter ,the getty program disappears before it goes away it starts the login program /bin/login to finish the login process.(terminals as tty1,tty2..)  For networked connections ssh,telnet,rlogin used to connect user shells to Pseudo TtyS (pts/0,pts/1.....etc) instead of getty
  • 16. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here Linux Boot Process 16  The /bin/login program begins execution it displays the string password. at terminal waits for you to type it.  When you typed the password login program verifies in /etc/passwd file.If success user’s shell starts up.  Each user who has account in system should have a seperate line present in the /etc/passwd file.  Upon failure it displays error message then init will respawn the getty again.
  • 17. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here Linux Boot Process 17  After the successful login, the /bin/login program starts up the login shell(/bin/sh) , typically its the last entry in the /etc/passwd file.  sh program looks for the /etc/profile file and executes its commands. It is the system wide initialization file setup.  It is the first file executes after system boot displays message of the day /etc/motd , last login information as well.  After that the .profile file which is a user-define Initialization file executed once at login and found in users home directory Environment, DB apps should be initiated.
  • 18. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here Linux Boot Process 18 PATH Variable  PATH variable used to locate commands typed at the command line.The path is colon separated list of directories used by the shell when searching for commands and is searched from left to right.  If the command is not found in any of the directories listed in the PATH the error message filename: not found  By exporting the PATH child processes will have access to it. [root@localhost ~]# echo $PATH /usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin
  • 19. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here Linux Boot Process 19 ALIAS  An alias is a shorthand notation provided by the shell to allow customization of commands.  The shell keeps a list of aliases that is searched when a command is entered. If the first word of a command line is an alias, it is replaced by the text of the alias.  An alias is defined by using the alias command. Eg: alias ll='ls –l'
  • 20. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here Linux Boot Process 20
  • 21. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here Linux Boot Process 21 References  http://www.novell.com/documentation/opensuse103/opensuse103_reference/data/sec_boot_proc.html  https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/3/html/Reference_Guide/s1-boot- init-shutdown-process.html  http://www.tldp.org/
  • 22. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here Regular Expressions 22 1). What is Regex 2). Features 3). Usage
  • 23. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here What is Regex 23 Regular expressions are a notation that lets you search for 1). Text that fits particular criterion 2). An expression that selects single or multiple data strings.
  • 24. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here Features 24 A large number of Unix utilities derive their power from regular expressions as listed 1). File viewers,utilities such as echo,ls,mv,rm,cp,more,less,find…many more 2).The grep family( grep,egrep and fgrep ) 3).Text editors such as vi,vim,nano,emacs,ed,jed,vile etc. 4).String processing languages such as awk,perl,python etc. 5). The stream editor : sed, for making changes to the input stream.
  • 25. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here Usage 25 Regular Expressions are mainly built from the Meta characters(Special Characters) There are 4 types of Meta Characters existing 1).Items that match single Character Meta characters 2).Items that appended to provide “counting” meta characters(Quantifiers) 3).Items that match positions meta characters 4).Other Meta characters
  • 26. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here Usage 26 1.Items that match single Character Meta characters . (Dot)  Which Matches any one character. [. . .] Character Class  which Matches any character listed. [^ . . .] Negated Character Class  which matches any character not listed. char Escaped character  Matches a literal character(Meta character itself)
  • 27. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here Usage 27 1). Dot( . ) in a regular expression matches any single character, no matter what it is Ex: if we want to match date format 15-08-1947 (or) 15/08/1947 (or) 15.08.1947 grep “15.08.1947” <file_name> 2). Character Class [ ... ] :- Matches any one character in the list. A hyphen with in the brackets indicates the range of consecutive characters. Ex: If you want to search a word separate which is misspelt as grep “sep[ae]r[ae]te” <file_name> Ex: If we want to match range of characters as grep “[0-9][A-Za-z]pple” <file_name>
  • 28. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here Usage 28 Like character class we also have POSIX class which is categorized in to three 1). POSIX Character class It consists of keywords bracketed by [: and :] Keywords describe alphabetic characters, control characters etc. Ex: grep “[[:digit:]]apple” <file_name> 2). POSIX Collating symbols Its a multicharacter sequence that should be treated as a unit. For examples there are locales where the two characters(ch) treated as a unit. Consists keywords bracketed by [. And .] . It works only for that specified locale. Ex: grep “[ab[.ch.]de]at” <file_name> 3).POSIX Equivalence Classes Its set of characters that should be considered equivalent such e ,è, ê and é in a French locale and it consist of characters Bracketed by [= and =] , It also works for specific locale only. Ex: grep “[a[ =e= ]iou]live” <file_name>
  • 29. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here Usage 29 With in the bracket expressions all other metacharacters lose their special meanings. Ex: grep “[ ]*.? ]” <file_name> [^ . . .] Negated Character Class If you use [ ^... ] instead of [ ... ], then the class matches any character that isn’t the list. Ex : grep “[^1-6]apple” <file_name> char Escaped character 1).The pairing of and metacharacter sequence matches the literal character. Ex: grep “*” <file_name>
  • 30. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here Usage 30 2). The pairing of and selected non-metacharacters sequence becomes an implementation defined meaning. Ex: grep “<0>” <file_name> often means the start of word. 3). The pairing of and any other character defaults to simply matching the character. Ex: mkdir regex&awk&sed A backslash within a character class is not a special at all. So it doesn’t provide any escape services in such a situation.
  • 31. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here Usage 31 2.Items that appended to provide“counting”meta characters(Quantifiers) ? (Question Mark)  Matches zero (or) one occurrence of preceding character (or) Regular expression. One allowed but optional * (Asterisk)  Matches zero (or) more occurrences of preceding character (or) regular expression i.e. Any number allowed but are optional. + (Plus)  Matches one (or) more occurrences of preceding character (or) regular expression i.e. One required additional are optional. {min,max} Specified range  Min Required and Max allowed.
  • 32. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here Usage 32 ? (Question Mark)  Matches zero (or) one occurence of preceding character (or) Regular expression. One allowed but optional  The Regular expression that contains the “ ? “ will always be successful either way. Ex: In order to match strings color and colour both we can use colou?r to match either. Ex : grep (or) egrep –i “colou?r” <<file_name”?>>  The u? Part will always be successful : sometimes it will match a “u” in the text, while other times it won’t , the whole point of the ? –optional part is that it’s successful either way.
  • 33. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here Usage 33 * (Asterisk)  Matches zero (or) more occurrences of preceding character (or) regular expression i.e. Any number allowed but are optional.  “ * “ means any number, including none, of the item. i.e. Try to match as many items as possible, but it’s okay to settle for nothing if need be. E.g. : An easily understood example of “ * ” is , The HTML specifications says that Spaces allowed immediately before closing the bracket of “ > “. In the below example it matches zero (or) more spaces before the closing of “ > “. [root@localhost ~]# egrep -i "<h[0-9] *>" htm <H1 > linux </H1 > <H2 > linux </H2 > <H3 > linux </H3 > <H4> linux </H4 >
  • 34. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here Usage 34 + (Plus)  Matches one (or) more occurrences of preceding character (or) regular expression i.e. One required additional are optional  The construct with plus(+) is similar in that it will also try to match as many times as possible, but difference with “? “ And “ * “ is it will fail if it can’t match at least once.  In the example it matches at least one (or) more spaces before the closing of “ > “. [root@localhost ~]# egrep -i "<h[0-9] +>" htm <H1 > linux </H1 > <H2 > linux </H2 > <H3 > linux </H3 >  How to use a regular expression with all the above quantifiers i.e. “ ? , + , * “
  • 35. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here Usage 35 Example: Let’s search for a HTML tag such as <HR SIZE=14> assumption 1: There should be at least one space between HR and SIZE so that we can use “ + “ to achieve this. assumption 2 : we may have additional spaces before or after the sign of equals to(=) so we can use “ * “ here. assumption 3 : Some times we may need to match <HR >,<HR > <HR > so we can use “ ? “ here to implement this. [root@localhost ~]# egrep -i "<HR +SIZE *= *[0-9]+ *>" htm <HR SIZE = 14> <HR SIZE = 15> <HR SIZE=16> In order to match <HR>, <HR > the RE as egrep -i "<HR( +SIZE *= *[0-9]+)? *>" htm
  • 36. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here Usage 36 {min,max} Specified range  min Required and max Allowed. {n}  Exactly n occurrences of the preceding character (or) regular expression. {n,}  At least n occurrences of the preceding character (or) regular expression. {n,m}  Between n and m occurrences of the preceding character or RE. Eg 1: grep “[[:upper:]]{3}” <file>  Which matches exact 3 uppercase letters Eg 2: grep “[[:upper:]]{3,}” <file>  Which matches atleast 3 uppercase letters Eg 3: grep “[[:upper:]]{4,6}” <file>  Which matches range of 4 to 6 uppercase letters
  • 37. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here Usage 37 3.Items that match Position Meta characters ^ (caret)  Matches the Position at the start of the line. $ (dollar)  Matches the Position at the end of the line. < (word boundary)  Matches the Position at the start of a word. > (word boundary)  Matches the Position at the end of a word.
  • 38. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here Usage 38 ^ (caret)  Matches the Position at the start of the line. Eg: If we want to match the only files(or) directories present under current directory. Regular files: ls –lrt | grep “^-” Directories: ls –lrt | grep –i “^d” $ (dollar)  Matches the Position at the end of the line. Eg: If we want to match the text of file that each line ends with the letter “a (or) A” cat <<file_name>> | grep –i “a$” < and > (word boundary)  which matches the exact starting and ending of a word. Eg: If we want to see the init process we can achieve this using exact name using these word boundaries. ps –eaf “ | grep –i “<init>”
  • 39. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here Usage 39 4).Other Meta characters | ( Alternation)  Matches either expression it separates. ( . . . ) (Parentheses)  Limits Scope of alternation. Provides grouping for the quantifiers and “captures” for backreferences. 1 , 2 ... (backreference)  Matches text which was previously matched with in first,second,...etc,. Set of parentheses.
  • 40. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here Usage 40 | ( Alternation)  Matches either expression it separates. Matches any one of several sub expressions . It allows you to combine multiple expressions into a single expression which matches any of the individual expression used to make it up. Eg: we can also use parenthesis to constrain the alternation. We have to match the words grey and grey we use as grep –i “grey|gray” and also grep –i “gr(a|e)y” grep –i “(Fir|1)st * [Ss]treet” <file_name> ( . . . ) (Parentheses)  Provides grouping for the quantifiers(*,+,?) and “captures” for Backreferences . Eg: egrep -i "<HR( +SIZE *= *[0-9]+)? *>" htm
  • 41. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here Usage 41 1 , 2 ... (backreference)  Matches text which was previously matched with in first, second,...etc,. Set of parentheses. Example: If we want to match same RE in two times in a line we can put 1st RE in the parenthesis and so that we can invoke it using notation as 1 in the same line again. egrep –i “([a-z]+) +1” file_name
  • 42. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here Usage 42 Real-time examples of Regular Expressions Q1: how to match a Dollar amount includes decimal points as well? example $0.95 , $1, $181989 etc Q2: How to match a number , either an integer (or) floating point number has an optional minus sign and any number of digits, an optional decimal point any no. of digits that follow.? Q3: How to match a time format of 12 hours like 09:17am,12:30pm etc...? Q4: How to match a time format of 24 Hours?
  • 43. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here Usage 43 Regular Expressions in vi Editor /apple/  Used to match the specified word in the file which is opened using vi /^apple/  vi will find only those line where RE apple is matched at the beginning of the line. /apple$/  vi will find those lines where RE apple is matched at the end of the line. /ap.le/  vi will find those lines where the RE consists ap followed by any single character, followed by l and an e /ap*le/  vi matches a and zero or more occurrences of letter p followed by a l and an e /[aA]pple/  vi searches for RE containing an uppercase or lowercase a followed by pple you can also range of characters /appl[a-zA-Z]/ and /appl[^a-zA-Z]/ /<apple>/  vi matches the exact word of apple <,> indicates starting, ending of a word.
  • 44. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here Usage 44 Regular Expressions in vi Editor :1,$s/<[aA]pple>/Steve/g  vi replaces word apple (or) Apple to the Steve in all occurrences of the entire file. :1,$s/([0o]ccur)ence/1rence/g vi replaces the misspelled word occurence to occurrence using remembered patterns . :s/(apple) and (steve)/2 and 1/  vi editor searches for the RE apple and steve and tags apple as 1 and steve as 2 . On the replacement side the contents of register 2 are submitted for 2 and the contents of register 1 are submitted for 1. /5{2}2{3}./  vi searches for the lines containing two occurrences of number 5 followed by three occurrences of number 2 followed by literal period.
  • 45. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here Regular Expressions 45 References  mastering regular expressions by Jeffrey Friedl The Third Edition  http://linuxreviews.org/beginner/tao_of_regular_expressions/  http://linux.die.net/Bash-Beginners-Guide/chap_04.html  http://www.tldp.org/
  • 46. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here AWK 46 AWK is a programming language used for manipulating data and generating reports. 1).What is awk and its Built-In variables. 2).How awk works with Records and Fields. 3).Reading Input from files (or) commands. 4).Printing output/Simple awk script 5).Real-time examples that are used in our Airbus environment.
  • 47. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here AWK 47 1).What is awk and its Built-In variables.  AWK stands for the first initials in the last names of the each of its authors 1.Alfred Aho, 2.Peter Weinberger, 3.Brian Kernighan  AWK instructions consists of a combination of patterns and actions. syn : awk ‘pattern {action}’ file_name  Generally awk script will be made up of 3 sections as shown. BEGIN, the BODY and the END sections. In other words, the presence of any of these sections is not mandatory.  All the instructions present in the BEGIN section can be executed be fore any input is read where as in END can be executed after all input is read. BEGIN { ..... } Problem Statements END { ....... }
  • 48. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here AWK 48 AWK Built-In variables Variable Meaning FS Input Field Separator(default: blank and tab) OFS Output Field Separator(default: blank or tab) RS Input Record Separator(default: new line ) ORS Output Record Separator(default : new line) NF Number of Fields in input record NR Number of current Record FNR Current record number in current file, which is incremented each time a new record is read, reinitialized to zero each time a new i/p file starts.
  • 49. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here AWK 49 2.How awk works with Records and Fields RECORDS(LINES NR,FNR,RS,ORS)  Each line of input is called a Record (NR) and is terminated with a new line. Eg: awk ‘{ print NR,$1,$2}’ <<file_name>> , awk ‘{ print NR , $0}’ <<file_name>>  The difference between NR and FNR , FNR will reinitialize to zero where as NR not. Eg: awk '{ print FNR ; print NR ; print $0 }’ <<File_1>> <<File_2>>  Record Separator (line Separator : RS) By default Output RS and the input RS are the same i.e. New line and stored in variables ORS and RS and can be changed as well.  $0 variable. An entire record is referenced as $0 by awk.  Example to illustrate the RS and ORS in a single command-line Eg : awk 'BEGIN { RS=":" } {ORS="-----------------"}{print }' /etc/passwd
  • 50. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here AWK 50 2.How awk works with Fields and Records FIELDS(NF,FS,OFS)  Each Record consists of words called fields, i.e. By default by white space(blank or tabs). Each of these words are called a field. AWK keep track of the number of fields in NF.  Each field can be represented by character dollar($) followed by numeric number as $1,$2,$3.......etc.. E.g. : awk ‘{ print $1,$2,$3.......}’ <<File_name>>  Field Separator (FS), By default Output FS and the input FS are the same i.e. New line and stored in variables OFS and FS and can be changed as well. Eg: awk 'BEGIN{FS=":";OFS="*******"}{print $1,$2,$3,$4,$5}‘ <<File>>  To change the value of FS you can also use –F option follwed by the character representing the new separator. Eg: awk –F: ‘{print $1,$5}’ /etc/passwd.
  • 51. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here AWK 51 3.Reading Input from files (or) commands Input from files  AWK is typically input-driven, it always asks input from the file. If you didn’t pass filename to it , awk waits until you give input to process. Eg: awk ‘{print “welcome to RHEL”}’ <<with_filename>> <<without_filename> Eg: awk ‘BEGIN{print “welcome to RHEL”}’ Input from commands  The output from a UNIX/Linux commands can be piped to awk for processing. Shell programs commonly use awk for manipulating commands. Eg : who | awk ‘{print $1}’ Eg : cal | awk ‘NR==1{print $0}’  To check the default separated fields in the configuration file(/etc/passwd) as below awk '{print "No.of fields :"NF}' /etc/passwd
  • 52. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here AWK 52 4.Formatting output and simple awk script  The Action part of awk is enclosed in curly braces. If no action is specified and a pattern is matched, awk takes the default action,which is to print the lines that are matched.  Print function is used to print simple output. For more sophisticated formatting , the printf and sprintf(like C) functions are used. Eg: awk -F: '{print "the real name of: " $1 "is t" $5}' /etc/passwd Eg: awk -F: '{printf " the real name of %s is t %sn",$1,$5}' /etc/passwd  awk 'BEGIN { print "Month Date" print "----- ----"} {OFS=" "} {ORS="nnn"}{print $1,$2}' <<file>>  If awk command line contains multiple lines we can put those line in separate script and we can execute that file using –f option as below. Eg: awk –f <script_contains_awk_structure> <Args>
  • 53. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here AWK 53 5).Real-time examples that are used in our Airbus environment with awk Q1:How to find the total size of the files of a particular month or whole? Q2:How to kill the lot of zombie processes with a single line of command? Q3:How to backup all the log files in the machine with a single line of command? Q4: How to find the sorted list of the login names of all users present on unix/linux Machine?
  • 54. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here AWK 54 References  Effective awk programming Third eddition by Arnold Robbins  sed & awk, 2nd Edition by Dale Dougherty, Arnold Robbins  http://www.gnu.org/software/gawk/manual/gawk.html  http://www.tldp.org/
  • 55. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here SED 55 sed - stream editor for filtering and transforming text 1.How does sed work? 2.Basic sed commands. 3.Simple sed script and Regex in sed. 4. Real-time examples that are used in our Airbus environment.
  • 56. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here SED 56 1.How does sed work? 1.SED reads each file one at a time. Once the line is read the line is placed in an area of memory called Pattern space 2.All editing operations are applied to the contents of the pattern Space When all operations have been completed, sed prints the output to the standard output. 3.After the line has been processed, it removed from pattern space and the next line is then read into the pattern space, processed and displayed 4.Sed ends when the last line of the input file has Input (pattern space) The Unix System The UNIX System The UNIX Operating System Output s/Unix/UNIX/ s/UNIX System/UNIX operating system
  • 57. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here SED 57 2.Basic sed commands 1. Substitution (s command) – syn : sed ‘s/pattern1/pattern2/’ <file> : Substitutes the first occurrence of pattern1 with pattern2. Eg : cat /etc/passwd | sed 's/root/sooperuser/‘ (or) sed ‘s/sooperusr/root/g’ /etc/passwd 2. Print (p command) – syn: sed ‘/root/p’ <<file>> : It is used to display the contents of the pattern space. by default sed prints lines to the screen so we have to use –n option to suppress default printing , with -n and p together selected text can be printed. Eg: sed ‘/root/p’ <file> (or) sed –n ‘/root/p’ <file> 3. Deleting(d command) : Used to delete the lines, when the d command is used , the line currently in the pattern space is removed. Eg: sed ‘3d’ <filename> //deletes 3rd line. (or) sed ‘3,$d’ <file> (or) cat /etc/passwd | sed ‘/^root/d’
  • 58. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here SED 58 2.Basic sed commands 4. Multiple Edits( -e command) : -e command is used for , if you have more than one editing commands to execute. All edits are applied to the line in pattern space. Eg: sed –e ‘1,3d’ –e ‘s/tim/larry/g’ <<file>> 5. Appending( a command) : Appending causes new text to be placed below the current line in the file. Eg : sed ‘/^root /a THE CURRENT USER IS SUPER USER’ <<file>> 6. Inserting( i command) : i command used to inserts the text above the current line in the file. Eg: sed ’/^root/i THE CURRENT USER IS SUPER USER’ <<file>> 7. Changing( c command): c command allows to modify or change existing text with new text. Eg: sed ’/^root/c THE CURRENT USER IS SUPER USER’ <<file>>
  • 59. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here SED 59 3.Simple sed script & Regex in sed  Sed script is a list of sed commands in a file. In order to invoke this file in a command line, use the –f option followed by the name of the script. Eg: sed –f sedscr sed_sample  If we have so many edits of a file we can put all edits in a file called sedscr [root@localhost ~]# cat sedscr [root@localhost ~]# cat sed_sample3 s/torvalds/steve/g Kernel is dvpd by linus torvalds , torvalds from FL s/FL/USA/ After script execution (sed –f sedscr sed_sample) o/p : Kernel is dvpd by linus steve, steve from USA  Sed has a built-in safeguard so that it won’t make changes to the original file.In above exapmle that is eg : sed –f sedscr sed_sample ,  doesn’t make changes to the sed_sample file.  Instead it sends all the lines to the standard output. We have to capture the output in a new file to save it . Eg : sed –f sedscr sed_sample > sed_newfile  By using $diff sed_sample sed_newfile progarm we can point out the differences as well.
  • 60. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here SED 60 3. Regular expressions in sed  following example which matches all the lines starting with root and then deleting them Eg: cat /etc/passwd | sed '/^root/d‘  Following is the example which would delete all the lines ending with sh Eg : cat /etc/passwd | sed '/sh$/d‘  Following command prints only those lines in the /etc/passwd file that start with a letter of the alphabet Eg: cat /etc/passwd | sed -n '/^[[:alpha:]]/p‘  Following command is to make the area code (the first three digits) surrounded by parentheses for easier reading in telephone number. Eg: sed -e 's/^[[:digit:]][[:digit:]][[:digit:]]/(&)/g' phone.txt  By using above example we can make the phone number much fancier to read as below. sed -e 's/^[[:digit:]]{3}/(&)/g’ –e 's/)[[:digit:]]{3}/&-/g' phone.txt o/p : (555)555-1212 (555)555-1213
  • 61. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here SED 61 4. Real-time examples that are used in our Airbus environment. Q1 : How to remove the content(of particular month) from the large sized log file? Q2: How to Backup the list of directories present under the current directory( sed’s substitution)? Q3: How to remove the all the files those are having white space in their filenames?
  • 62. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here SED 62 References  sed & awk, 2nd Edition by Dale Dougherty, Arnold Robbins  http://www.grymoire.com/Unix/Sed.html  http://www.tldp.org/
  • 63. © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Confidentiality label goes here Thank you 