Unix Quick Learn


Published on

Learn Unix Commands in a day

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Unix Quick Learn

  1. 1. UNIX Overview The UNIX operating system was designed to let a number of programmers access the computer at the same time and share its resources. The operating system coordinates the use of the computer's resources, allowing one person, for example, to run a spell check program while another creates a document, lets another edit a document while another creates graphics, and lets another user format a document -- all at the same time, with each user oblivious to the activities of the others. The operating system controls all of the commands from all of the keyboards and all of the data being generated, and permits each user to believe he or she is the only person working on the computer. This real-time sharing of resources make UNIX one of the most powerful operating systems ever. Although UNIX was developed by programmers for programmers, it provides an environment so powerful and flexible that it is found in businesses, sciences, academia, and industry. Many telecommunications switches and transmission systems also are controlled by administration and maintenance systems based on UNIX. While initially designed for medium-sized minicomputers, the operating system was soon moved to larger, more powerful mainframe computers. As personal computers grew in popularity, versions of UNIX found their way into these boxes, and a number of companies produce UNIX-based machines for the scientific and programming communities. The uniqueness of UNIX The features that made UNIX a hit from the start are: · Multitasking capability · Multiuser capability · Portability · UNIX programs · Library of application software Multitasking Many computers do just one thing at a time, as anyone who uses a PC or laptop can attest. Try logging onto your company's network while opening your browser while opening a word processing program. Chances are the processor will freeze for a few seconds while it sorts out the multiple instructions. UNIX, on the other hand, lets a computer do several things at once, such as printing out one file while the user edits another file. This is a major feature for users, since users don't have to wait for one application to end before starting another one. Multiusers The same design that permits multitasking permits multiple users to use the computer. The computer can take the commands of a number of users -- determined by the design of the computer -- to run programs, access files, and print documents at the same time. The computer can't tell the printer to print all the requests at once, but it does prioritize the requests to keep everything orderly. It also lets several users access the same document by compartmentalizing the document so that the changes of one user don't override the changes of another user.
  2. 2. System portability A major contribution of the UNIX system was its portability, permitting it to move from one brand of computer to another with a minimum of code changes. At a time when different computer lines of the same vendor didn't talk to each other -- yet alone machines of multiple vendors -- that meant a great savings in both hardware and software upgrades. It also meant that the operating system could be upgraded without having all the customer's data inputted again. And new versions of UNIX were backward compatible with older versions, making it easier for companies to upgrade in an orderly manner. UNIX tools UNIX comes with hundreds of programs that can divided into two classes: · Integral utilities that are absolutely necessary for the operation of the computer, such as the command interpreter, and · Tools that aren't necessary for the operation of UNIX but provide the user with additional capabilities, such as typesetting capabilities and e-mail. Tools can be added or removed from a UNIX system, depending upon the applications required. UNIX Communications E-mail is commonplace today, but it has only come into its own in the business community within the last 10 years. Not so with UNIX users, who have been enjoying e- mail for several decades. UNIX e-mail at first permitted users on the same computer to communicate with each other via their terminals. Then users on different machines, even made by different vendors, were connected to support e-mail. And finally, UNIX systems around the world were linked into a world wide web decades before the development of today's World Wide Web. Applications libraries UNIX as it is known today didn't just develop overnight. Nor were just a few people responsible for it's growth. As soon as it moved from Bell Labs into the universities, every computer programmer worth his or her own salt started developing programs for UNIX. Today there are hundreds of UNIX applications that can be purchased from third-party vendors, in addition to the applications that come with UNIX. How UNIX is organized The UNIX system is functionally organized at three levels: · The kernel, which schedules tasks and manages storage; · The shell, which connects and interprets users' commands, calls programs from memory, and executes them; and · The tools and applications that offer additional functionality to the operating system The three levels of the UNIX system: kernel, shell, and tools and applications. The kernel The heart of the operating system, the kernel controls the hardware and turns part of the system on and off at the programer's command. If you ask the computer to list (ls) all the
  3. 3. files in a directory, the kernel tells the computer to read all the files in that directory from the disk and display them on your screen. The shell There are several types of shell, most notably the command driven Bourne Shell and the C Shell (no pun intended), and menu-driven shells that make it easier for beginners to use. Whatever shell is used, its purpose remains the same -- to act as an interpreter between the user and the computer. The shell also provides the functionality of quot;pipes,quot; whereby a number of commands can be linked together by a user, permitting the output of one program to become the input to another program. Tools and applications There are hundreds of tools available to UNIX users, although some have been written by third party vendors for specific applications. Typically, tools are grouped into categories for certain functions, such as word processing, business applications, or programming. Logging In & Out When you have established contact with the Unix system, the login prompt will be displayed. You must give your username followed by your password: login: lnp3jb Password: secret1 The username can be up to 8 characters in length. Unix usernames contain only lowercase characters, and it is important that you type your username in lower case (if you don't you will be permitted to log in, and then the shell will not recognise case differences.) The password must normally contain between 6 and 8 characters. On some unix systems the password must contain at least 1 non-alphabetic character. System messages When you log in a number of system messages may be displayed. The more filter will be used to control the output if the file contains more than a screenful of information. Just press the space bar to see the next screenful if it says 'more' at the bottom of the screen. The message: You have new mail indicates that electronic mail has been sent to your mailbox. The prompt When your login procedure is completed you should see the system prompt. This indicates that the shell is running and is awaiting instructions from the user. The prompt can take many forms, and you can change it later on if you want to. Often the prompt will contain the % character, and a number in brackets. This number will represent the number of a command, and can be used to recall commands already issued. It may also display the name of machine or system that you are logged onto. Some users prefer to
  4. 4. have the name of the current working directory displayed in their prompt. For convenience, in this document, the % character will be used to represent the prompt. Changing your password Use the passwd command to change your password: % passwd -where '%' is the prompt Changing password for lnp5mw Old password: -type in your old password New password: -type in your new password Retype new password: -and again, to make sure % Logging out When you have finished your unix session you must log out from the system. To do this give the command: % logout You should always wait for the message confirming that you have logged out. On some unix systems you may receive the message: logout: command not known If this happens you should type: exit You may occasionally get the message: There are stopped jobs If this happens simply give the logout command again. ----------------------------------------------------------------------------------------------------------- - PRACTICE Log in to the unix system using your username and password.Change your password using the passwd command. You may find that the system will not change your password immediately. In this case you may have to use your old password next time that you log on.
  5. 5. ----------------------------------------------------------------------------------------------------------- - THE UNIX FILESTORE ------------------ File hierarchy Unix has a hierarchical tree-like filestore. The filestore contains files and directories. The top-level directory is known as the root. Beneath the root are several system directories. The root is designated by the / character. The directories below the root are designated by the pathnames: /bin /etc /usr Confusingly, the / character is also used as a separator in pathnames. Historically, user directories were often kept in the directory /usr. However, it is often desirable to organise user directories in a different manner. Users have their own directory in which they can create and delete files, and create their own sub-directories. For example: /user/ei/eib035 belongs to someone whoe has the username eib035. Some typical system directories below the root directory: /bin contains many of the programs which will be executed by users /etc files used by system administrators /dev hardware peripheral devices /lib system libraries /usr normally contains applications software /home home directories for different systems The current directory This refers to your actual location in the filestore hierarchy. When you log in the current directory is set to the home directory. You can then change current directory, effectively moving around the filestore tree structure. The current directory is also called the quot;current working directoryquot; and the quot;working directoryquot;. The current directory can be referred to in pathnames by the . character (a full stop). Changing current directory The command cd is used to change your current directory. For example:
  6. 6. % cd bin will move you from your current directory, down one quot;branchquot; to the directory bin, if such a directory exists. Typing cd with no arguments takes you to your home directory. Display current directory The command pwd is used to display your current directory. For example: % pwd /home/sunserv1_b/lnp5jb/bin Pathnames Files and directories may be referred to by their absolute pathname. For example: /home/sunserv1_b/lnp5jb/bin/hello Files and directories may also be referred to by a relative pathname. For example, if your current directory is /home/sunserv1_b/lnp5jb, the above file can be referred to as: bin/hello The home directory Each user has a home directory. They will be attached to this directory when they log in. Jenny Brown's home directory is: /home/sunserv1_b/lnp5jb The symbol ~ can be used to refer to the home directory. If Jenny Brown wishes to refer to her file she can give: ~/bin/hello rather than typing the long form: /home/sunserv1_b/lnp5jb/bin/hello The symbol ~ can also refer to other the home directory of other users. For example Jenny can refer to a file in John Smith's home directory using: ~lnp5js/test.dat The parent directory The parent directory is the directory above the current directory. The parent directory can be referred to by the .. characters (two full stops). For example to refer to the file test.dat in the parent directory:
  7. 7. ../test.dat Linking files The ln command can be used to link files and directories across the filestore system. The symbolic link function (ln -s) is the most useful. This enables a file or directory to appear to be in a particular directory when it is in fact stored somewhere else. This can save the user from having to type out long pathnames for frequently used files or directories. For example, if you want to use the files in /usr/games regularly, you can set up a symbolic link to this directory. If Jenny Brown is in her home directory and types: % ln -s /usr/games fun this will create what appears to be a new directory below her home directory, entitled fun. When she does cd fun she will move to /usr/games. If she now does pwd, the current directory will appear as /home/sunserv2_a/lnp5jb/fun. Some things may be a little surprising however: the parent directory, for example, will be that of the original file or directory. -------------------------------------------------------------------------------- Exercises Check which directory you are currently in. If necessary, move to your home directory. (Remember: cd will do this from anywhere). Move to the root directory. (quot;Move to...quot; means quot;change your current working directory to...quot;. It is useful to picture the process as movement around the tree structure.) Work your way down one directory at a time to your home directory. Experiment with using relative and absolute pathnames; show how the two can produce the same results. Explore your systems filestore. Try to get into the home directory of someone else you know! (You may not be able to view their files.) -------------------------------------------------------------------------------- UNIX COMMANDS -------------------------------------------------------------------------------- Unix commands have the general format: command [options] [item] Items in brackets are optional, and words in italics are generic identifiers (i.e. options must be replaced by a particular option, e.g. -a). Note that:
  8. 8. Commands are case sensitive. The command ls is different from LS. In fact LS is not recognised as a valid command. Command options consist of a single character. The command to list all the files in a directory is ls -a and could not be ls -all (the latter would have to mean a combination of options.) Command options can usually be combined or listed separately. For example: ls -al or ls -a -l The command item is given last. This is very often a file name. For example: ls -a file1.f not ls file1.f -a The echo command The echo command 'echoes' its argument to the standard output. This means that in its simplest form it prints something out on screen. For example: % echo Hello - you type Hello - response from the shell% Who is logged on? The command who gives a list of logged on users: % who root console Jan 4 10:34 men6matw ttyp1 Jan 6 09:45 (ecusun1) cbl6nd ttyp2 Jan 6 10:10 (cblslcd) cbl6ar ttyp3 Jan 6 16:03 (cblsuna) csc6ea ttyp4 Jan 6 14:15 (csuna1) root ttyp5 Jan 6 10:40 (sun032) ecl6rsh ttyp6 Jan 6 15:39 csc6ea ttyp8 Jan 6 14:15 (csuna1) lnp5mw ttyUf Jan 6 16:16 lnp5jb ttyp3 Jan 6 15:20 (sun051) Also try the command finger. This command gives the full name of logged in users. -------------------------------------------------------------------------------- PRACTICE Type finger to get information on yourself and other users.
  9. 9. -------------------------------------------------------------------------------- Creating a directory The mkdir command is used to create directories. The format of this command is: % mkdir directory_name Jenny Brown stores her unix scripts in a directory called scripts beneath her home directory. In order to create this directory she uses the command: % mkdir scripts Deleting a directory The rmdir command is used to delete directories. The format of this command is: % rmdir directory_name Jenny Brown stores files for project work in a directory called proj. When the project has been completed she deletes the directory using the command: % rmdir proj Note that the directory must be empty before it can be deleted. Listing contents of a directory The command ls is used to list the contents of a directory. For example: % ls file1 scripts test.f test Notice that directories are listed as well as files. To list all files, including hidden files, give the command: % ls -a .cshrc file1 bin test.f test Hidden files begin with . (a full stop). Hidden files are normally system files, and will normally include the following: % ls -a .cshrc .forward .history .login .logout .cshrc contains commands that are executed every time you start off a C-shell, including when you log in .forward enables you to redirect your mail to another computer .history contains a record of previously executed commands .login contains commands that are executed at login time
  10. 10. .logout contains commands that are executed at logout time The purpose of some hidden files. To identify directories in a listing give the command: % ls -F file1 bin/ test.f test Notice how the directory is identified by the slash (/) character. Deleting files Files can be deleted using the rm command. For example: % rm test.f Displaying files The command cat is used to display the contents of a file on the screen. For example: % cat file1 Creating files The command cat can also be used to create a file. For example: % cat > test.f When typing in a new file the input must be terminated by ^D NOTE ^D means press the <ctrl> and the d keys simultaneously. Be careful not to type ^D when you have the shell prompt, because this might log you out. Normally you would use an editor for creating files. This example is given since it illustrates how to create a small file without needing to learn the use of an editor. Copying files The command cp is used to copy a file. It takes the format: % cp old_file new_file For example: % cp file1 file2 Renaming files The command mv is used to rename a file.
  11. 11. For example: % mv file2 temp changes the name of file2 to temp. Moving files The command mv is also used to move a file to a new location in the filestore hierarchy. For example: % mv file2 bin moves the file file2 into the subdirectory bin. Overwriting files Commands such as rm and cp can be dangerous if not used with care. The command: % cp file1 file2 will delete file2 if a file of that name already exists. If you have spelled the name of the new file incorrectly you may accidentally overwrite the contents of a file. Using the wildcard symbol * with the command rm can also be very dangerous. The command: % rm test* will delete all files starting with test. However if you inadvertently type an extra space (do not try this!): % rm test * -do not try this! the file test will be deleted if it exists. Then all other files in the directory will be deleted! Often no warning will be given. To prevent accidental deletion of files you can use the -i option with commands such as rm. The format of the command is: % rm -i file You will be asked to confirm that files are to be deleted. You may find that this is set as the default on your system. Wildcards Wildcard characters can be used to identify directory and file names. The wildcard character * is used to refer to any combination of characters. For example:
  12. 12. % ls * - refers to all files % cat test* - refers to all files starting with 'test', e.g. 'test', 'testing', 'test.c', etc. The wildcard character ? is used to refer to a single character. For example: % ls test? - refers to files starting with 'test' followed by a single character e.g. 'test1', 'test2', 'testz', etc.% cat test.? - refers to all files starting with 'test' with a single character after the full stop, e.g. 'test.c, test.f' -------------------------------------------------------------------------------- Exercises Display your current working directory using the pwd command. Make a directory called exercises. Change your directory to the directory exercises. Display the current working directory. Return to your home directory. List the contents of your directory. Use the -l, -a and -F options and compare the output. Change your directory to the directory exercises. Create a file called example1 using the cat command containing the following text: water, water everywhere and all the boards did shrink; water, water everywhere, Nor drop to drink List the contents of your directory. Use the -l option to obtain a long listing. Viewing files with the more command The command more is used to display the contents of a file on the screen. The command is particularly useful for viewing long files since the display stops at the bottom of the screen. The following is a listing of a program in the Icon programming language: % more lookup.icn # program to look up words (given at the terminal) in the # computer usable version of the OALD # last change 18.12.91 # set global parameters global k # main body procedure main() # input word to be searched for write(quot;Give me a word: nquot;) word:=read()
  13. 13. # this the important line - call the 'lookup' procedure if not write(lookup(word)) then write(quot;Not found in the dictionary.quot;) end procedure lookup(voc) # connect to the dictionary (dict:=open(quot;/home/sunserv1_a/ecl6rsh/oald.mitton/cuv2quot;)) | stop(quot;can't open the dictionaryquot;) # lookup algorithm every k:=1 to *voc do { --More-- (75%) The message at the bottom of the screen means that 75% of the file has been viewed so far. (The amount shown on screen will depend on the type of terminal you are using.) You can now do the following: To continue viewing press the space bar To view the next line press <RETURN> To quit press the <q> key To jump to the next occurrence of a string of characters type /string For a list of valid commands press the <h> key. Viewing files with the pg command The pg command is also available on some systems. This is an alternative to more % pg lookup.icn # program to look up words (given at the terminal) in the # computer usable version of the OALD # last change 18.12.91 # set global parameters global k # main body procedure main() # input word to be searched for write(quot;Give me a word: nquot;) word:=read() # this the important line - call the 'lookup' procedure if not write(lookup(word)) then write(quot;Not found in the dictionary.quot;) end procedure lookup(voc)
  14. 14. # connect to the dictionary (dict:=open(quot;/home/sunserv1_a/ecl6rsh/oald.mitton/cuv2quot;)) | stop(quot;can't open the dictionaryquot;) # lookup algorithm every k:=1 to *voc do { bit:=bite(voc) Commands can be typed to the ':' prompt at the bottom of the screen: Type <RETURN> to view the next screen. Type <h> for a list of valid commands. -------------------------------------------------------------------------------- PRACTICE -------------------------------------------------------------------------------- If you have a file longer than 20 lines use pg to view it. Compare the use of pg with more. Use them both on the file /etc/passwd, and find the listing for your own username. Searching for strings in files The command grep is used to search a file for a string of characters. For example, to search the file lookup.icn for the character '#' (which designates comments in the program), use the command: % grep # lookup.icn # program to look up words (given at the terminal) in the # computer usable version of the OALD # last change 18.12.91 # set global parameters # main body # input word to be searched for # this the important line - call the 'lookup' procedure # connect to the dictionary # lookup algorithm A lot of pattern matching operations can be carried out with grep. The following example shows the use of a regular expression. In this example, the search is restricted to lines beginning with the 'p' character. % grep quot; pquot; lookup.icn procedure main() -output starts here procedure lookup(voc) procedure bite(voc2)
  15. 15. You will learn more about pattern matching expressions later. Control characters The actual key sequences for the following operations can vary from between different systems and different terminals. The most commonly used key sequences are described below. If it is different on your system, remember the correct sequence and use it whenever the key sequences below are referred to later in the text. Where possible the operation itself is named (e.g. end-of-file), and not just the key sequence. Deleting the last character typed If you make a typing mistake you can delete the last character typed by using your delete key, which is usually the one marked <DEL> or <DELETE>. Deleting the entire line If you make many typing mistakes you can delete the entire line by typing ^U. NOTE Remember ^U means quot;press <CTRL> and <u> keys simultaneouslyquot;. Sending an interrupt If you wish to terminate the execution of a command type ^C. Sending an end-of-file character In many Unix commands you need to finish your input with an end-of-file character. The default end-of-file character is ^D. Printing on paper This is usually called 'obtaining hard copy output', as distinct from output to the screen or a file. The command lpr sends a file to the line printer: % lpr file1 Note that the command lp is used on some Unix systems. The command: % lpr -Pprinter file is used to submit the file to a specific printer. -------------------------------------------------------------------------------- The locally developed command printers can be used to obtain a list of printers. -------------------------------------------------------------------------------- Getting help
  16. 16. The command man is used to display help on the syntax of Unix commands. The format of this command is: % man [option] [file] For example to obtain help information on the who command, type: % man who The keyword option -k keyword is used to display a list of help files associated with the keyword. For example to display a list of all man files associated with password type the command: % man -k password getpass(3) read a password passwd(1) change login password passwd(5) password file The command man automatically invokes the more program for viewing files. You can use the normal more commands to continue viewing. -------------------------------------------------------------------------------- If you have any problems that can't be solved by referring to the manual, please consult your supervisor or the Advisory Service. The Help Desk can be contacted in person in the User Access Area, on the telephone on extension 5366, or by email to helpdesk. Also the LUCS Unix system operators can be contacted on telephone extension 5380. With non- urgent problems, an email message to your supervisor is usually the most efficient way of getting help. (See next chapter on how to use email.) -------------------------------------------------------------------------------- Exercises 1. Display a list of logged on users. 2. Obtain further information for a particular user using the finger command. 3. Use the man command to obtain further information on the finger command. 4. Use the man -k command to find what manual entries there are related to passwords. 5. Use the grep command to search the file example1 for occurrences of the string 'water'.
  17. 17. 6. Use man and the keyword option to find out more information on communications and e-mail in Unix. 7. Print out a file on paper. COMMUNICATIONS -------------------------------------------------------------------------------- Mail The mail command enables the user to send and receive electronic mail messages to and from users on both the Unix system and remote users. This is the basic mail command. Enhanced versions, such as programs that run under a windows program (e.g. mailtool), or screen-based versions of mail (e.g. elm) may be available, and you will probably find them preferable to mail. If so, much of the following can safely be ignored. Remember however that some version of mail will definitely be available on any unix system that you use. Sending mail To send a message to a user on your system, type: % mail username The cursor will move to the next line, and you will get a Subject: prompt. You can now type in the subject of your message, and then press <RETURN>. The cursor will go to the start of the next line and there will be no prompt. You now type in the text of your message. Terminate each line with <RETURN>. When you have finished the text of the message, type an end-of-file character (usually ^D), or a full-stop character. You should now return to your normal shell prompt. If the message is dispatched successfully, you will hear no more about it. The following is example of the mail command in action: % mail lnp6ttld Subject: UNIX course I don't think I'll ever be able to get the students in the UNIX course to understand how to use e-mail. ^D % Entering the text of the message by this method is a rather crude process. Errors on the line being typed can be erased with your delete key, but once you have pressed <RETURN>, a line cannot be edited. A message may be aborted by pressing ^C twice. --------------------------------------------------------------------------------
  18. 18. PRACTICE -------------------------------------------------------------------------------- Send yourself a message. (You will find out where it has gone in the next section.) Subcommands while entering mail There are several commands you can type while entering mail: <CTRL/Z> will cancel the message, and leave the text in a file named dead.letter. ^e invoke a text editor to edit your message. ~v invoke a screen editor to edit your message. ~f reads the contents of the message you have just read, into your message text. ~r file reads contents of file into your message text. While this method is quick and easy to use, and quite adequate for short and simple messages, many users prefer to first create a file containing the text of the message, and then mail this file to the intended recipient. This enables you to use any system editor and formatter to create the message, and you do not need to send it immediately. The following sequence shows how to send a file note containing the text of a message to another user. % mail lnp6ttld < note To understand fully how this works see the section on 'Re-direction of standard output' in Chapter 8 below. In this example the message will not contain a subject heading, unless one has already been included as the first line of the file note. There is a -s option with the mail command, that can be used to include a subject header, as follows: % mail -s UNIX lnp6ttld < note The string following the -s is the subject; in this case, the subject is quot;UNIXquot;. Receiving mail If new mail is waiting for you when you login, you will see the message: You have new mail
  19. 19. To start the mail program type the command: % mail Each message is summarised on a numbered list. The current message is marked with a quot;>quot; character. The mail prompt character is quot;&quot;. Type the number of the message you want to read, or just press <RETURN> to read through the list. The list of mail headers will look something like this: % mail Mail version SMI 4.0 Thu Oct 11 12:59:09 PDT 1990 Type ? for help. quot;/usr/spool/mail/lnp5jbquot;: 2 messages 2 new >N 1 lnp5mw Thu Jan 9 15:10 11/262 hello N 2 lnp5js Thu Jan 9 15:11 10/287 party & This tells Jenny Brown that she has two messages, one from user lnp5mw, and one from lnp5js. The date and time at which the messages were received is also listed, and so is the subject header (the last item on each line - here 'hello' and 'party'). The following commands can be entered to the mail prompt: d Mark the current message for deletion d n Mark message number n for deletion u n undelete message number n. w file save the current message in file with the mail header and mark for deletion s file Save the current message in file without the mail header and mark for deletion r Reply to the current message q Quit mail, removing deleted messages from your system mailbox. Undeleted messages that have been read are normally stored in your personal mailbox (see below) x Exit mail, leaving your mailbox untouched, i.e. messages deleted in this session are restored h Show list of message headers ? List the useful mail commands ! command Execute specified shell command
  20. 20. - Re-read previous message. m recipient Send mail to named recipient Files used by mail ~/mbox Your personal mailbox, located in your home directory. This is where messages that you have saved are stored, unless you specified another location when you saved them. You can access this file by issuing the command: % mail -f mbox ~/.mailrc A file that can hold commands for mail to obey when it starts up. -------------------------------------------------------------------------------- PRACTICE -------------------------------------------------------------------------------- See if you have received any mail. If you have, save a message to your mailbox file. Send yourself another message, and this time discard it. Send a message to another user. Sending mail to remote users The following also applies to the elm mail program. Sending mail to users on other computer systems is simple using mail. Simply type the full address of the remote user where the system username is used above. For example: % mail lnp5mw@uk.ac.leeds.gps or% mail -s Hello ecl6rsh@uk.ac.leeds.cms1 < note These two examples show two ways of sending mail shown above. It is also possible to use mail to look at folders of mail that you have already received. To do this type: % mail -f folder_name and it will treat the messages in the folder as incoming mail. Sending on-line messages As you have seen, messages sent using mail are received in a special buffer, and it is up to the recipient when to look at them and what to do with them. It is also possible to send
  21. 21. a message that will simply appear on the screen of the recipient, if they are logged on. This is less useful than mail for the following reasons: mail can be used irrespective of whether the recipient is logged on or not. mail messages can be stored by the recipient. This means that files can be transferred by mail, and a record of transactions can be kept. On-line messages can be confused with whatever the recipient has on screen and can easily disrupt what the are doing. They can be very annoying! On the other hand, on-line messages do have the advantage of obtaining the immediate attention of another user, and it is possible to have an interactive conversation. Bearing these facts in mind, use the following command with caution! write The write command is used to send on-line messages to another user on the same machine. The format of the write command is as follows: % write username text of message ^D After typing the command, you enter your message, starting on the next line, terminating with the end-of-file character. The recipient will then hear a bleep, then receive your message on screen, with a short header attached. The following is a typical exchange. User lnp5jb types: % write lnp8zz Hi there - want to go to lunch? ^D % User lnp8zz will hear a beep and the following will appear on his/her screen: Message from lnp5jb on sun050 at 12:42 Hi there - want to go to lunch? EOF If lnp8zz wasn't logged on, the sender would see the following: % write lnp8zz lnp8zz not logged in.
  22. 22. SunOS has the talk command. This has several advantages over write. Firstly, talk can call other machines on a network. Secondly, talk provides a clearer interface for the exchange of messages, dividing the screen into two windows for the interlocutors. Type talk username@machine to start a conversation. -------------------------------------------------------------------------------- PRACTICE -------------------------------------------------------------------------------- Try to have an extended on-line conversation with another user. You can stop messages being flashed up on your screen if you wish. To turn off direct communications type: % mesg n It will remain off for the remainder of your session, unless you type: % mesg y to turn the facility back on. Typing just mesg lets you know whether it is on or off. Remote logins It is possible to log on to another machine on a Unix network, provided that you have permission to do so. To do this use the rlogin command. Type: rlogin machine and you will be asked for your password. It may be necessary for you to do this to make on-line communications with another user easier. -------------------------------------------------------------------------------- Exercises 1. Send a message to another user on your Unix system, and get them to reply. 2. Create a small text file and send it to another user.
  23. 23. 3. When you receive a message, save it to a file other than your mailbox. (Remember you can always send yourself a message if you don't have one.) 4. Send a message to a user on a different computer system. 5. Send a note to your course tutor telling him that you can use mail now. FILE PERMISSIONS -------------------------------------------------------------------------------- What are file permissions? The Unix file security system can prevent unauthorised users from reading or altering files. Every file and directory has specific permissions associated with it, giving different categories of user certain permissions to look at or change a file, and to run executable files. NOTE Executable files are files containing commands than can themselves be executed as if the file itself were a command. The file permissions can be displayed using the command: % ls -l [filename] For example, to display the permissions on the file lookup.icn, type the command: % ls -l lookup.icn -rw-r--r-- 1 lnp5jb 777 Dec 18 lookup.icn The first set of characters in the output from the command (-rw-r--r--) gives the permissions. The username in the middle of the line (lnp5jb) is the owner of the file. This is user who created the file. The following fields tell you the number of characters in the file, the date it was created and the name of the file. Note that the first character specifies the file type. This is normally one of the following: - indicates a file d indicates a directory The following nine characters represent permissions for different classes of users. Users on a Unix system are assigned to a group or groups, which might correspond to a
  24. 24. particular department, or research group in the real world. Members of a particular group can be allowed access to files belonging to other members of the group. The second, third and fourth characters in the permissions string represent permissions that apply to the owner of the file. The next three characters apply to members of the owner's group. The last three apply to all other users. The file in this example therefore has rw- for the owner, r-- for the group and r-- for others. The three characters corresponding to each class of user each represent a different type of permission. The first character represents 'read' permission. This means that a user has permission to open a file and view the contents. If there is an r in this position then that class of users has read permission. In this example all users have read permission. In this, and in every case, a horizontal bar character (-) means that permission is denied. The second position represents 'write' permission (the right to make changes to a file). In the example, only the owner has write permission. Normally, you will not want others to be allowed to make changes to your files, so write permission is only allowed to the owner. The third position represents 'execute permission'. This means permission to 'execute', or run, a file that works like a command. In this example no-one has execute permission for the file lookup.icn (it is an Icon program, and it would have to be compiled before it could be executed, so execute permission would be useless). To summarise the above, this is how the permissions string is divided up: - rw- r-- r-- type of file owner group others Here is another example, this time an executable file: -rwxr-x--x 1 lnp5jb 562 Jan 10 hello This tells us that hello is a file; the owner is lnp5jb, the owner has read, write and execute permission; the group has read and execute permission; others just have execute permission. -------------------------------------------------------------------------------- PRACTICE -------------------------------------------------------------------------------- What are the default permissions for your files and directories? Are they all the same?
  25. 25. When you copy a file what file permissions does the new file have? Changing file permissions The command chmod is used to change the permissions on a file. The format of this command is: % chmod mode filename For example, to add read permission for the group to the file file1, give the command: % chmod g+r file1 chmod modes In the command: % chmod mode filename the mode consists of three elements: who operator permissions The following options are possible: who: u user (owner) g group o other a all operators: - remove permission + add permission = assign permission permissions: r read
  26. 26. w write x execute For example: chmod o-rw file1.f removes read and write permissions from others. chmod u+x test adds execute permission to the owner. Permissions for directories Read, write and execute permissions are set for directories as well as files. Read permission means that the user may see the contents of a directory (e.g. use ls for this directory.) Write permission means that a user may create files in the directory. Execute permission means that the user may enter the directory (i.e. make it his current directory.) -------------------------------------------------------------------------------- Exercises 1. Try to move to the home directory of someone else in your group. There are several ways to do this, and you may find that you are not permitted to enter certain directories. See what files they have, and what the file permissions are. (Remember that you can protect your own files from prying eyes, or from interference.) 2. Try to copy a file from another user's directory to your own. 3. Set permissions on all of your files and directories to those that you want. You may want to give read permission on some of your files and directories to members of your group. STANDARD INPUT AND OUTPUT -------------------------------------------------------------------------------- Standard input Input to Unix commands is normally given from the keyboard. For example you can use the cat command interactively: % cat
  27. 27. Hello - you typeHello - responsethere - you typethere - response^D - you type% Note that input from the keyboard is terminated with the end-of-file character, usually ^D. For another example consider the spell command, which is the unix spelling checker: % spell - you typeInput to the spell ulitity - you typeis typed at the keyboard - you type D - you typeulitity - response The spell command outputs words that are incorrectly spelled in the input. Standard output Output from Unix commands is normally displayed on the screen. For example: % spell Input to the spell ulitity is typed at the keyboard ^D ulitity - output -------------------------------------------------------------------------------- PRACTICE -------------------------------------------------------------------------------- Try out the spell checker. See how it copes with British spellings (remember it's an American system), proper nouns, hyphens and recently coined vocabulary. Re-direction of standard input It is possible to redirect standard input so that the input is taken from a file. Imagine you wish to check for spelling errors in a report. A text can be put into the file report, which can be fed into the spell command: % cat > report Input to the spell ulitity can come from a file ^D % spell < report ulitity The < character is used to re-direct the input from the file report to the command spell. The general format for re-direction of user input is:
  28. 28. command < filename Another common use of re-direction of standard input is to mail a file to another user. The command: % mail lnp8zz < report will mail the file report to local user lnp8zz. Re-direction of standard output You do not always want the output from a Unix command to be displayed on the screen. It has already been shown how it is possible to direct the output from the cat command to a file. Imagine you want a list of your files and directories kept in a file. You would use the command: % ls > filelist The > character is used to re-direct the output from the command to the file called filelist. The general format for re-direction of user output is: % command > filename Note that output directed to the file /dev/null is effectively discarded. This is the system 'wastebasket'. Another example involves directing the output of echo to a file: echo quot;Hello therequot; > greeting This would normally overwrite any existing contents of the file greeting. Study the following sequence: % echo quot;Hello therequot; > greeting % cat greeting Hello there % echo quot;This insteadquot; > greeting % cat greeting This instead It is possible to append output to a file, rather than overwriting it, by using the >> operator. For example: % echo quot;Hello therequot; > greeting % cat greeting Hello there
  29. 29. % echo quot;and goodbyequot; >> greeting % cat greeting Hello there and goodbye Look carefully at the difference between these two examples. Re-direction of input and output It is possible to re-direct both standard input and output. If you have a report containing many spelling mistakes you may wish to keep a list of the mistakes in a file. You can do this using the following command: % spell < report > errors Piping Output from one command can be sent ('piped') to the input of another command using the | character: command1 | command2 A common use for pipes is to control the output of large files to the screen. It is possible to send output to the more command so that only one screenful at a time is output. If the command % ls -l is used to give a long listing of all files and directories there may be too many lines to see them all at once on the screen. (If you don't have many files, move to /etc where there should be plenty.) Output from ls -l can be piped to more as follows: % ls -l /etc | more You can then use the usual more commands to control the output. In the output from ls -l, directories are identified by the d character at the start of each line. A list of just the directories can be obtained by piping the output of this command to the grep command, giving grep an option which will list only lines containing the d character at the start of the line. The command is: % ls -l | grep quot;^dquot; The commands sort and grep are often used when piping. For example: % cat phonenos | sort | lpr
  30. 30. will send an alphabetically sorted list of the phone numbers contained in the file phonenos to the line printer. The command: % cat phonenos | grep leeds | sort | lpr will send a sorted list of phone numbers containing the string 'leeds' to the line printer. -------------------------------------------------------------------------------- Exercises 1. Put a listing of the files in your directory into a file called filelist. (Then delete it!) 2. Create a text file containing a short story, then use the spell program to check the spelling of the words in the file. 3. Redirect the output of the spell program to a file called errors. 4. Type the command ls -l and examine the format of the output. Pipe the output of the command ls -l to the word count program wc to obtain a count of the number of files in your directory. AN INTRODUCTION TO THE EX LINE EDITOR -------------------------------------------------------------------------------- What's ex for? Editors available on Unix include: ed basic line editor ex line editor vi screen editor emacs screen editor Ex is an enhanced and more friendly version of ed. Vi is a screen-based version of ex. Most users have no practical use for a line editor nowadays, and they are really a relic of an earlier age in computing. However, you may occasionally have to use ex, if for some reason you can't run a screen editor on your terminal. It is covered here mainly to teach something else, namely, the way that Unix handles texts. This is perhaps most transparent when you are using ex. Ex forces the user to use complicated pattern matching operations to do things that are comparatively easy with a screen editor, such as making correcting small typing errors in the text. While taking this approach may at times seem
  31. 31. unnecessarily difficult, it should be remembered that what follows here is just a stepping stone to other Unix utilities, such as vi (which you are far more likely to want to use as an editor than ex), and commands that use regular expressions, such as grep, tr and awk. Learning to use ex involves skills necessary for getting the most out of these utilities. Using ex Starting ex The command ex is used to invoke the editor. The format of this command is: % ex [filename] A filename can be supplied if you wish to edit an existing file. % ex oldfile quot;oldfilequot; 10 lines 465 characters : Alternatively the filename may be used as the name of a new file: % ex newfile quot;newfilequot; [Newfile] : notice that the prompt for ex commands is the ':' character. Adding Text To enter text simply type the command a (short for append), and then type in the text, as follows: :a This is the text Input is terminated by typing a full stop ('.') on a new line: :a This is just one line of text . : The command i is used to insert text before the current line. Saving Your Data The command w (short for 'write') is used to save your data. The format of this command is: :w [filename]
  32. 32. If no filename is specified, the filename given when ex was invoked will be used. E.g.: :w test.f test.f 50 lines 576 characters : The number of lines and characters in the file will be displayed. Quitting the Editor The command q (short for 'quit') is used to quit the editor. Note that if changes have been made to the file and have not been saved the editor will respond with a warning message: No write since last change (:quit! overrides) The command quit! (or just q!) must be given if you wish to quit without saving your changes: Displaying Lines in the File The p command (for 'print') used to display lines in the file. The format of this command is: :[line_range] p If no range is supplied the current line is displayed. Pressing <RETURN> is equivalent to moving on to and displaying the next line. With small files it is possible to display the entire file by pressing <RETURN> until the end of the file is reached. Line Ranges Ranges of lines that can be given to edit commands include: Absolute line number 6 refers to line 6 1,6 refers to lines 1 to 6 Relative line numbers -2 refers to 2 lines before the current line +3 refers to 3 lines after the current line -2,+3 refers to a range from 2 lines before the current line to 3 lines after the current line
  33. 33. Special symbols $ refers to the last line in the file e.g. $p to display last line, 1,$p to display entire file . refers to the current line e.g. .,$p to display from the current line to the end Examples: 6d - deletes lines the sixth line1,6d - deletes the first six lines1,$d - deletes all lines3a - append text after line three.,+10w new - saves the next ten lines to a file called new The = operator gives the line number, with the last line the default, so typing = gives you the number of lines in a text. The number of the current line is obtained by typing .=. Deleting Lines The d command is used to delete lines. The format of this command is: :[line_range] d If no line number is given the current line will be deleted. It is possible to supply a range of lines. For example: :1,$d will delete the entire file. Searching Searches are carried out by including the search string in slashes ('/'): /string/ The search will start at the current line. :/Jane/ This is Jane's file The special characters '^' and '$' can be used to assist the search. For example: /^This/ will find a line beginning with 'This'/file$/ will find a line ending in 'file' The last string searched for is the default string. This means that you can repeat a search just by typing //.
  34. 34. Reverse Searches Reverse searches are carried out by including the search string in question marks ('?'): :?string? The search will start at the current line and search backwards through the file. Making Substitutions The s command is used to make substitutions. The format of this command is: :[line_range]s/old_string/new_string/ If no line number is given substitutions will be made only on the current line. For example: :s/old/new/ will substitute the first occurrence of the string 'old' with 'new' on the current line. The command: :.,$s/old/new/ will substitute the first occurrence of the string 'old' with 'new' in every line from the current line to the end of the file. Global Substitutions The g command (for 'global') is used to make multiple substitutions on a line. For example: :s/old/new/g will substitute all occurrences of the string 'old' with 'new' on the current line. The command: :1,$s/old/new/g will substitute all occurrences of the string 'old' with 'new' in the file. Search strings can also be used in conjuction with the s command in order to carry out more sophisticated global changes. The line range preceding a substitution string may include a search for the string to changed. For example: :g/old/s//new/g
  35. 35. This means 'search globally for 'old', then replace every occurrence with 'new'. Remember the null string (in s//) stands for the last RE, in this case the RE 'old'. This is the same as: :1,$s/old/new/g Additional ex facilities Additional commands available using the ex editor include: c replaces lines t transfers lines m moves lines j joins lines l shows invisible characters f gives the name of the file being edited r inserts named file e edits named file u undo last change The commands m and t above work in a similar way, in that they require two line addresses, one before and one after the command. The address in front refers to the source and the address after the destination. If either is omitted, the current line is assumed. Line addresses may be ranges, allowing blocks of text to be moved. Here are a few examples of commands: :.m2 This moves the current line to a position after line 2. :1,.m$ This moves a block (line 1 to the current line) to the end of the text. :1,.t$ This copies the block at the end of the text, leaving the original block untouched.
  36. 36. -------------------------------------------------------------------------------- Exercises 1. Create a file using ex. Put the text of a message in the file and then mail it to someone (see chapter on mail). -------------------------------------------------------------------------------- 2. Use ex to explore the file /etc/passwd. Search for your own listing, and those of others in your group. (You won't be able to save changes to the file). 3. Find a text file to which you have access and copy it to your home directory. Try making some changes to it. REGULAR EXPRESSIONS -------------------------------------------------------------------------------- What are regular expressions? A regular expression (RE) is a string of characters that can be used to match a set of character strings. For example, to globally search for all occurrences of the word quot;andquot; would require a search for quot;andquot;, quot;Andquot;, quot;AnDquot;, quot;ANDquot;, etc. Without regular expressions finding all possible occurrences of quot;andquot; would require eight separate searches. Using an RE the search could be done with one command. Regular expressions are used by many Unix utilities, including: ed ex vi grep sed awk (The awk utility interprets a special-purpose programming language that makes it possible to handle simple data-reformatting jobs easily with just a few lines of code. Awk is not covered in this course, but the GAWK Manual is a good guide to its use.) Regular expressions are used in searches and substitutions.
  37. 37. Character strings A character string is the simplest regular expression which simply matches the string itself. For example: /hello/ - matches 'hello's/hello/goodbye/ - matches 'hello' and makes a substitution Matching single characters The '.' character is used to match a single character. For example: /p.t/ - matches 'p' and 't' separated by a single character, e.g. 'pit', 'put', 'pot', etc. Sets of characters The expression /RE/ is used to match a set of characters in a single character position. For example: /x[ab2X]y/ - matches any of the following: xay xby x2y xXy In the expression /[RE]/ a range of characters can be specified. For example: [a-z] - matches any single lower case character[0-9] - matches any single digit Note however: [0-57] - matches any one of the following:0 1 2 3 4 5 7 i.e. 0-5 and 7. Sets of characters can be combined: [a-d5-8X-Z] - matches any one of the following:a b c d 5 6 7 8 X Y Z It is possible to specify a set of characters which are not to be matched in the RE. For example: [^0-9] - matches any single character which is not a digit Anchors An anchor is used to match a RE found at a particular position. For example: /^RE/ - matches RE at the start of a line /RE$/ - matches RE at the end of a line /^RE$/ - matches RE as the whole line
  38. 38. Note that there are two separate uses of the '^' operator. One is as the sart of line anchor, and the other as the 'logical not' operator. The latter function only applies inside square brackets. Repetitions Multiple occurrences of REs can be specified. For example: a* - matches 0 or more occurrences of 'a'aa* - matches 1 or more occurrences of 'a'.* - matches any string of characters Remembered regular expressions A null RE stands for the last RE. For example: :/[Tt]he.*car/p The blue car exploded with a roar. :s//(The blue car)/p (The blue car) exploded with a roar. The '&' character in a replacement string stands for the most recently matched string. For example: :/[Tt]he.*car/p The blue car exploded with a roar. :s//(&)/p (The blue car) exploded with a roar. Sub-expressions A sub-expression in a RE can be referred to. (string) - defines an RE sub-expressionn - refers to the nth RE sub-expression NOTE The backslash is the escape character for REs. This means it neutralises the special meanings of special characters. For example: :p A line of text :s/(line).*(text)/21/p A text line :* Repetition It is possible to specify multiple occurrences of REs. For example: c{4} matches exactly 4 c'sc{4,} matches 4 or more c'sc{2,4} matches between 2 and 4 c's
  39. 39. For example, to find a line containing 5 digits: /[0-9]{5}/ A summary of special characters Special characters in the search string start of line anchor (or NOT operator inside [] ) $ end of line anchor . any character * character repeated any number of times escape character [ ] contains range of characters Special characters in the replacement string & string matched in search string escape character Note that any regular expression can be used with grep. (It gets its name from the editor command g/RE/p which means 'globally search for RE and print it'). This opens up many new possibilities for the use of grep. Unix commands that use regular expressions often makes the use of an editor redundant. -------------------------------------------------------------------------------- PRACTICE Obtain a listing of the members of your group from the password file using grep. -------------------------------------------------------------------------------- Introduction to sed sed is a non-interactive stream editor which is used for text. The command to invoke sed is: sed [-n] [-e command] [-f edfile] [input_file] For example:
  40. 40. sed quot;s/UNIX/Unix/gquot; thesis > thesis.new This will process the file thesis line by line, outputting each line to the file thesis.new and replacing each occurrence of the string quot;UNIXquot; with quot;Unixquot;. In the above example every line of thesis will be output to thesis.new, irrespective of whether it has been changed or not. This is because the default output for sed is every line of the input. Using the -n option supresses the default output, and only specified lines are output. In the above example this would mean that no lines would be output in the following example: sed -n quot;s/UNIX/Unix/gquot; thesis > thesis.new since a change but no output has been specified. If a print command is added, as follows: sed -n quot;s/UNIX/Unix/gpquot; thesis > thesis.new then only those lines in which quot;UNIXquot; had been changed to quot;Unixquot; would be output. As you also see in the example, the -e option is not not necessary when there is only one editor command. It is possible to specify more than one command, and in this case each must be preceded by -e. For example: % sed -e quot;s/a/A/quot; -e quot;s/b/B/quot; file1 > file2 This command will carry out the two substitutions on each line of file1. The -f option enables the user to use a file containing editor commands, instead of typing out a series of commands with the -e option. sed examples The sed command to list only files (exclude directories) is: % ls -l | sed -n quot;/ -/pquot; -rw------- 1 lnp5jb 1765 mbox -rw------- 1 lnp5jb 320 example1 The sed command to extract a list of usernames from the password file is: % sed quot;s/:.*//quot; /etc/passwd | more What this does is to delete everything that comes after ':' in the password file. --------------------------------------------------------------------------------
  41. 41. Exercises 1. Reproduce the effects of the above sed examples using grep instead. Note that grep is generally better for searches, such as this, while sed can be used to make changes to files. 2. Find the system's games directory and type quiz function ed-command to do the ed commands quiz. Don't worry if there are a couple of things that you haven't come across. Try it again and see if you improve your score. PROCESSING LARGE TEXT CORPORA -------------------------------------------------------------------------------- This section will focus on exploiting large files containing linguistic material with the use of the commands already covered plus many more. Compressed files Often large files are compressed to save disk space. If this is the case then the user must make the file revert to it's original format in order to be able to do anything with it. A popular compressing command is called, simply, compress. The command: % compress filename will cause the file to be replaced by a compressed file with a .Z suffix. The command uncompress will cause it to revert to its original format. It is often not necessary to uncompress a file to use it. In fact, the file will often be owned by someone else, and you would have to copy it and then uncompress it, using up a great deal of disk space and processor time. It is often better to use the zcat which sends the uncompressed contents of a compressed file to the standard output, while leaving the compressed version of the file in the filestore. -------------------------------------------------------------------------------- PRACTICE Try compressing and uncompressing some of your own files. Find a large compressed file on your system and search it for some appropriate string using grep without uncompressing the file. -------------------------------------------------------------------------------- Some useful commands for processing text files
  42. 42. The following is a summary of some useful commands for processing text files, some of which you have met already, some of which are new to you. Both have been included so that this section can easily be used for reference purposes. Not all of these commands are standard Unix, so they may not all work in the way you expect (or at all) on your system. For the same reasons, their syntax is somewhat incongruous and some use different input and output conventions. Not all are included in the command summary in the appendix below. See the relevant manual pages for more details. sort sort into alphabetical order sort -n sort into numerical order sort -m merge sorted files into one sorted file sort -r sort into reverse order (highest first) sort -c check a file is already sorted uniq remove duplicate lines (or partly-duplicate lines) uniq -d output only duplicate lines uniq -c count identical lines (or lines with identical fields) grep find lines containing given string or pattern grep -v find lines not containing given string or pattern grep -c count lines containing given string or pattern grep -n give line numbers of lines containing... fgrep same as grep except that it does not recognise regular expressions egrep same as grep except that it recognises all REs grep only recognises certain special characters wc -c count characters wc -w count words wc -l count lines NOTE wc -l file will output the number of lines in the file, and the file name.
  43. 43. wc -l < file just gives the bare line count. head -17 output first 17 lines tail -17 output last 17 lines tail +30 output from line 30 cut -f3 delete all but third field of each line cut -f3,5 delete all but third and fifth fields of each line cut -f3-5,7 delete all but 3rd, 4th, 5th, 7th fields of each line cut -c-4,6-8 delete all but 2nd 3rd 4th, 6th 7th 8th characters cut -f2 -dquot;:quot; deletes all but the second field where quot;:quot; is the field delimiter (tab is the default) paste combines files horizontally; corresponding lines are appended paste -dquot;>quot; pastes with delimiter defined as quot;>quot; (tab is default). The special characters quot;nquot; (newline) and quot;0quot; (null string) may be used. cat concatenates file vertically (appends files to one another) cat -n precedes each line with a line number in the output cat -b as above, but does not number blank lines cat -s reduces any number of successive blank lines to one blank line tr quot;abc-equot; quot;kmx-zquot; translates a, b, c, d, e to k, m, x, y, z respectively. tr -d quot;xyquot; deletes all occurrences of x and y tr -s quot;aquot; quot;bquot; translates all a to b and reduces any string of consecutive b to just one b. To go down to the character, rather than field, level, sed is simplest for line by line processing. sed looks for patterns, so is not very good with column or field positions. uniq needs an already-sorted file. A common idiom is sort | uniq
  44. 44. to produce a sorted list of all the different lines in a file. uniq has a peculiar way of spacing its output, so it is difficult to use in a pipeline with another command such as cut. tr is useful for converting blanks to newlines (hence converting a text to a vertical list of words, which can then be sorted, counted etc.). The command: % tr quot; quot; quot;012quot; < filename will do this. 012 is the octal code for the linefeed character. This is also useful for converting strings of blanks or tabs to single characters. 011 is the octal code for the tab character. -------------------------------------------------------------------------------- PRACTICE Try out the following pipeline on a text file: -------------------------------------------------------------------------------- tr quot; quot; quot;012quot; < input_file | sort | uniq > output_file -------------------------------------------------------------------------------- Using language corpora A corpus (plural corpora) is a collection of language data. The corpora with which we will be concerned here are electronic, that is they are stored in a computer. Corpora may contain data about written or spoken language. They usually contain texts from one language, but they may also be multilingual. Corpora are usually designed and collated for a specific purpose. Many of the major corpora in use today aim to be representative of different domains of language use, and can facilitate comparative studies. For example, the average length of words in academic texts and newspaper reports could be compared by measuring words in texts from these two domains. Computers obviously make this type of number-crunching (or word-crunching) activity much easier than it would be if you had to count words and letters in a printed text. Corpora are particularly useful for checking the intuitions that we have and the generalisations that are made about language use. Unix commands can be used to extract information from language corpora. The commands learned in this course can be used for issuing commands and writing simple scripts that can be used to extract information from language corpora. Types of Corpora There are many types of corpora, defined by the types of language that they represent and the formats in which that information is stored. Unix commands for handling strings are
  45. 45. sufficiently flexible to handle many different formats. Users however need to be sensitive to the arcane minutiae of the format and markup of the different corpora that they use. The 'l' command in the vi editor can be used to view hidden characters (such as spaces and tabs) in a file. The LOB and Brown corpora Brown and LOB are parallel corpora, with very similar formats and tagging. Brown, which was constructed first, represents different types of written American English. LOB represents the same categories of British English. All words are lemmatised and given a word class tag. Here is a sample from the so-called 'vertical tagged' version of Brown: ^N01002001 ----- ----- ----- N01002010 - NP Alastair N01002020 - BEDZ was N01002030 - AT a N01002040 - NN bachelor N01002041 - . . ^N01002042 ----- ----- ----- N01002050 - ABN all N01002060 - PP$ his N01002070 - NN life N01002080 - PP3A he N01002090 - HVD had N01002100 - BEN been N01002110 - VBN inclined N01002120 - TO to N01003010 - VB regard N01003020 - NNS women N01003030 - IN as N01003040 - PN something N01003050 - WDTRwhich N01003060 - MD must N01003070 - RB necessarily N01003080 - BE be N01003090 - VBN subordinated N01003100 - IN to N01004010 - PP$ his And the 'untagged' version of the same passage, plus the following lines: N01 0010 DAN MORGAN TOLD HIMSELF HE WOULD FORGET Ann Turner. He N01 0020 was well rid of her. He certainly didn't want a wife who was fickle N01 0030 as Ann. If he had married her, he'd have been asking for trouble. N01 0010 DAN MORGAN TOLD HIMSELF HE WOULD FORGET Ann Turner. He N01 0020 was well rid of her. He certainly didn't want a wife who was fickle N01 0030 as Ann. If he had married her, he'd have been asking for trouble.
  46. 46. N01 0040 But all of this was rationalization. Sometimes he woke up in N01 0050 the middle of the night thinking of Ann, and then could not get back N01 0060 to sleep. His plans and dreams had revolved around her so much and for N01 0070 so long that now he felt as if he had nothing. The easiest thing would N01 0080 be to sell out to Al Budd and leave the country, but there was N01 0090 a stubborn streak in him that wouldn't allow it. The best antidote N01 0100 for the bitterness and disappointment that poisoned him was hard N01 0110 work. He found that if he was tired enough at night, he went to sleep Users can choose the version (from those available to them) which includes the information that they need. If you are only interested in word frequencies, then the grammatical information encoded in the tagged version is redundant, and the untagged version can be used. If however you are looking for the word 'set' used as a noun, then it would be necessary to use a tagged version, so that this word can be differentiated from 'set' used as a verb or adjective. Processing LOB and Brown The Susanne corpus This corpus uses a section of the Brown corpus and marks it up with syntactic information. N01:0010a - YB <minbrk> - [Oh.Oh] N01:0010b - NP1m DAN Dan [O[S[Nns:s. N01:0010c - NP1s MORGAN Morgan .Nns:s] N01:0010d - VVDv TOLD tell [Vd.Vd] N01:0010e - PPX1m HIMSELF himself [Nos:i.Nos:i] N01:0010f - PPHS1m HE he [Fn:o[Nas:s.Nas:s] N01:0010g - VMd WOULD will [Vdc. N01:0010h - VV0v FORGET forget .Vdc] N01:0010i - NP1f Ann Ann [Nns:o. N01:0010j - NP1s Turner Turner .Nns:o]Fn:o]S] N01:0010k - YF +. - . N01:0010m - PPHS1m He he [S[Nas:s.Nas:s] N01:0020a - VBDZ was be [Vsb.Vsb] N01:0020b - RR well well [Tn:e[R:h.R:h] N01:0020c - VVNt rid rid [Vn.Vn] N01:0020d - IO of of [Po:u. N01:0020e - PPHO1f her she .Po:u]Tn:e]S] N01:0020f - YF +. - . N01:0020g - PPHS1m He he [S[Nas:s.Nas:s] N01:0020h - RR certainly certainly [R:m.R:m] N01:0020i - VDD did do [Vde. N01:0020j - XX +n<apos>t not . N01:0020k - VV0v want want .Vde] N01:0020m - AT1 a a [Ns:o101. N01:0020n - NN1c wife wife .
  47. 47. N01:0020p - PNQSr who who [Fr[Nq:s101.Nq:s101] The London-Lund corpus This corpus differs from the others that we have looked at because it is a transcription of spoken English. Intonation is marked. 1 1 1 10 1 1 B 11 ((of ^Spanish)) . graphology#/ 1 1 1 20 1 1 A 11 ^w=ell# ./ 1 1 1 30 1 1 A 11 ((if)) did ^y/ou _set _that# - / 1 1 1 40 1 1 B 11 ^well !Joe and _I#/ 1 1 1 50 1 1 B 11 ^set it between _us#/ 1 1 1 60 1 1 B 11 ^actually !Joe 'set the :paper#/ 1 1 1 70 1 1 B 20 and *((3 to 4 sylls))*/ 1 1 1 80 1 1 A 11 *^w=ell# ./ 1 1 1 90 1 1 A 11 quot;^m/ay* I _ask#/ 1 1 1 100 1 1 A 11 ^what goes !into that paper n/ow#/ 1 1 1 110 1 1 A 11 be^cause I !have to adv=ise# ./ 1 1 1 120 1 1 A 21 ((a)) ^couple of people who are !doing [dhi: @]/ 1 1 1 130 1 1 B 11 well ^what you :d/o#/ 1 1 1 140 1 2 B 12 ^is to - - ^this is sort of be:tween the :tw/o of / 1 1 1 140 1 1 B 12 _us# / 1 1 1 150 1 1 B 11 ^what *you* :d/o#/ 1 1 1 160 2 1 B 23 is to ^make sure that your 'own . !candidate/ 1 1 1 170 1 1 A 11 *^[m]#*/ 1 1 1 160 1 2(B 13 is . *.* ^that your . there`s ^something that your / 1 1 1 160 1 1(B 13 :own candidate can :h/andle# - -/
  48. 48. CUVOALD This acronym stands for the Computer Usable Version of the Oxford Advanced Learners Dictionary. There are in fact two versions. The most useful is usually in a file called cuv2.dat contains 68742 words including inflected forms and proper nouns. It is most often of use as a wordlist, but the file also contains a phonemic transcription and a part- of-speech tag for every word. Here is a sample of cuv2.dat: verbs v3bz Kj verdancy 'v3dnsIL@ verdant 'v3dnt OA verdict 'v3dIkt K6 verdicts 'v3dIkts Kj verdigris 'v3dIgrIs L@ verdure 'v3dj@R L@ verge v3dZ I2,K6 3A verged v3dZd Ic,Id 3A verger 'v3dZ@R K6 vergers'v3dZ@z Kj verges 'v3dZIz Ia,Kj 3A verging 'v3dZIN Ib 3A verifiable 'verIfaI@bl OA verification ,verIfI'keISn M6 verifications ,verIfI'keISnz Mj verified 'verIfaId Hc,Hd 6A verifies 'verIfaIz Ha 6A verify 'verIfaI H3 6A verifying 'verIfaIIN Hb 6A verily 'ver@lIPu verisimilitude ,verIsI'mIlItjud M6 verisimilitudes,verIsI'mIlItjudz Mj veritable 'verIt@bl OA verities'verItIz Mj verity 'verItI M8 vermicelli ,v3mI'selI L@ vermiform 'v3mIfOm OA vermilion v@'mIlI@n M6,OA The coding conventions for the phonemic and syntactic tags are explained in a file that comes with dictionary. Some examples of applications that use the dictionary can be found in the appendix of this course. Other texts Corpus building is currently a growth area, and there are many, many more corpora as well as the above examples. Currently available or under construction are a number of very large corpora, comprehensive corpora aiming to cover all registers of English,
  49. 49. international English corpora, corpora of different languages and specialised corpora covering a single well-defined domain of language. -------------------------------------------------------------------------------- Exercises 1. Find a large text file with a fixed field format (e.g. the Brown or LOB corpora) and inspect the format. Use zcat to view it if necessary. 3. Use cut to strip away the reference material and leave just the text field. 4. Use tr to strip away any tags that are actually in the text (e.g. attached to the words), so that you are left with just the words. 5. Make a sorted wordlist from the file. 6. Combine the above commands in a shell script so that you have a small program for extracting a wordlist. INTRODUCTION TO THE VI SCREEN EDITOR -------------------------------------------------------------------------------- What is vi Vi is a screen editor. This means that you can see part of the file in a window on the screen, and editing operations can be controlled by moving a cursor around the text on screen. Vi works in a different way from the editing functions of modern word processors. It's effective use requires a considerable amount of expertise on the part of the user. The user must have the ability to remember and manipulate opaquely named one-letter commands that can be combined in an arbitrary variety of different ways. Vi is a screen-based version of ex. It's lack of user-friendliness is largely a result of this. In many ways it still works like a line editor, with complicated commands typed in by the user. The main enhancements on ex are the window, which enables you to constantly view part or all of the file, the visible cursor and the commands that can be issued without moving to the command line. Once you have learned to start vi, you will probably not need to use ex again. Everything that you have learned with ex, you can do with vi. What is more, with vi you have a window and the possibility to use interactive commands. The only
  50. 50. time that you might want to use ex now is if you have trouble running a screen-based utility on your terminal. Using vi The next section lists the commands needed to start and use vi. In this section, the key concepts underpinning the use of vi are explained so that you can understand what is happening when you use it. The first thing to understand is that there are three modes: command mode: insert mode last line mode (or command line mode) You start in command mode. The commands listed below for moving the cursor and changing the file are entered in command mode. To enter a command simply type it at the keyboard. What you type will not appear anywhere on screen. To abandon a command you have started, you can type <ESC>. If you are not sure which mode you are in at any time you can type <ESC> and return to command mode. When you leave the other modes you return to command mode. Insert mode is used to enter text. Insert mode is entered by issuing one of a variety of commands that involve entering text. Insert mode must be exited in order to issue more commands. A common mistake made is to attempt to enter a command while in insert mode, which results in the command appearing on screen as part of the text. Last line mode is entered from command mode, and enables the user to type a command on the last line of the screen. Any ex command can be used in this way, simply by typing ':' followed by the command. The current line will be that where the cursor is positioned. When you start vi you will see a screen similar to the one below. If you are starting a new file, or the file you are editing is less than 18 lines long, then the empty lines in the window will be marked by the '~' (tilde) character. -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- This is a small file called 'vi.prac'. This is the second and last line. ^ ^ ^ ^
  51. 51. ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ quot;vi.pracquot; 2 lines 103 characters A typical vi screen Note that is necessary to press return at the end of each line of text that you enter. Otherwise, vi will interpret all of your text as a single line! -------------------------------------------------------------------------------- PRACTICE Create a new file, enter several lines of text and save it. Edit an existing file that you have, making several changes. -------------------------------------------------------------------------------- vi reference vi modes command Normal and initial state. <ESC> cancels partial command insert entered by the following commands: a, A, i, I, o, O, c, C, s, S, R. Terminates with <ESC> (or ^C). last line entered by :, /, ? or !. Input is read and echoed at the bottom of the screen. Commands executed by <RETURN> or <ESC>, terminated by ^C. Entering and leaving vi % vi file edit file % vi +n file edit starting at line n % vi + file :edit starting at end % vi +/RE/ file edit starting at RE % view file read only mode ZZ exit from vi, saving changes (same as :wq) ^Z stop vi process, for later resumption Some simple commands
  52. 52. The following are examples of some compound commands, using the operators listed later. dw delete word de delete word leaving punctuation dd delete line 4dd delete 4 lines xp transpose characters cwtext<ESC> change word to text File manipulation The following are all last line mode commands, so must be preceded by a colon. w save changes wq save and quit q quit q! quit, discarding changes e file edit file e! re-edit current file, discarding changes w file write to file w! file overwrite file ! command execute shell command, then return f show current file and line Positioning within the file ^F forward one screenful ^B back one screenful ^D scroll down half screen ^U scroll up half screen nG go to line n (last line default) /RE/ go to next occurrence of RE % find matching bracket Marking `` return to previous cursor position mx mark position with x `x go to mark x Line positioning H top line of window (home) M middle line of window L last line of window + next line, at first non-white character - previous line, at first non-white character <RETURN> same as + j next line, same column (same as down arrow) k previous line, same column (same as up arrow) Character positioning 0 beginning of line ^ first non-white in line
  53. 53. $ end of line <SPACE> forward (same as right arrow) fx find x forwards in current line Fx find x backwards in current line ; repeat last find command forwards : repeat last find command backwards n| go to column n Words, sentences, paragraphs w forward to start of next word (delimited by non-alphanumeric character) b back to start of last word e forward to end of next word W as w, with word delimited by blank only B as b, with word delimited by blank only E as e, with word delimited by blank only ) forward to start of next sentence ( Back to start of next sentence } Forward to start of next sentence { Back to start of last sentence Corrections during insert H erase last character (or your usual delete key) W erase last word escape character <ESC> ends insert; back to command mode C ends insert Insert and replace commands a append after cursor i insert before cursor A append at end of line I insert before first non-blank o open line below current line O open line above current line rx replace single character with x R replace characters Operators The following can be doubled to apply to a line and also preceded by a number to indicate a number of lines. They can be combined with positional commands (e.g.d$ to delete to end of line.) d delete c change y yank Miscellaneous operations x delete character X delete character to left of cursor C change rest of line (same as c$). D delete rest of line (same as d$)
  54. 54. J join lines Y yank (paste) lines Yank and put p put back after cursor P put back before cursor quot;xp put from buffer x quot;xy yank to buffer x quot;xd delete to buffer x Undo, redo and retrieve u undo last change U restore current line . repeat last command quot;np retrieve nth last delete TEXT FORMATTING -------------------------------------------------------------------------------- There are text formatting facilities available with all Unix implementations. They will not be investigated in any detail here. Many users will prefer to use a PC-based word processing package for document production. Those that want to format text on Unix will have vastly differing needs, and it would be impossible to go into all of the possibilities here. A flavour of the simpler programs is given here, and users can look elsewhere for more extensive documentation. pr This is a filter that will format a text, giving a choice of columns, page width, length etc.. It is not capable of sophisticated formatting for document production. nroff The simplest of the proper formatters is nroff. You can format a plain text file with nroff, by simply typing: % nroff text_file Formatting commands can be inserted into text files. Some simple commands: .ce centre text .ll line length .pl page length .po page offset (left margin) .sp blank line These commands may be followed by a numerical argument, which will make the command apply to the specified number of lines, e.g. .sp 3 to leave three blank lines. Formatting commands must be placed at the beginning of a line to be recognised as such. Normally they appear as the only text on a line. Commands are normally composed of lower-case characters. Here is an example of a text containing some nroff instructions:
  55. 55. .ce This is the title .sp 2 And this is the text, which will be formatted and justified when I run nroff. You will see that the line breaks will change, and the text will look tidier. That is what formatting is all about. .sp That was a blank line. The following is what the output from this file would look like: This is the title And this is the text, which will be formatted and justified when I run nroff. You will see that the line breaks will change, and the text will look tidier. That is what formatting is all about. That was a blank line. nroff macros Macros are a special type of nroff command, identified by being in upper-case characters. Standard macro libraries can be invoked by using option flags with the nroff command, e.g.: nroff -ms filename for the standard macros. Other macro libraries can be invoked by the me, mn and mv options. Here are some standard macros: .FS footnote starts .FE footnote ends .ND no date .TL title .PP start paragraph The .PP tag, for example, is the equivalent of the following sequence of ordinary nroff instructions: .sp 5 .ce 1 .sp 5 It is possible write your macros. More details on nroff can be found in the manual. MORE ON THE SHELL
  56. 56. -------------------------------------------------------------------------------- General The role of the shell A Unix shell is used to: evaluate the command line. For example: % car nofile car: Command not found Here the shell looks for a command called car. Since it cannot find this command it gives an error message. perform variable substitution. For example: % echo quot;In directory $HOMEquot; In directory /home/sunserv1_b/lnp5jb Here the shell variable $HOME is evaluated and displayed. handle pipelines. For example: % who | wc -l Here the output from who is piped through to the wc command which displays a count of the number of lines in its input. Types of shells A number of shells are available for Unix systems, including: Bourne shell C shell Korn shell Graphical User Interface (GUI) shells The Bourne shell, which was developed by Steve Bourne at Bell Laboratories, is one of the oldest shells and, as such, has gained a lot of popularity. It is widely used for shell programming because of its efficiency and because it is available on all Unix systems. The C shell provides sophisticated interactive capabilities lacking in the Bourne shell. The C shell, which was developed at the University of California, Berkeley, has a syntax
  57. 57. which resembles the C language. Features of the C shell include a command history buffer, command aliases and file name completion. However the C shell does not allow efficient shell programs (also known as scripts) to be written. Due to the fact that C shell programs are written in a style similar to the C programming language, people who are unfamiliar with C may find the C shell difficult to program in. The Korn shell combines the best features of the Bourne and C shells. Korn scripts are 95% upwardly compatible with Bourne scripts. The Korn shell interactive features include: in-line editing command editing job control Graphical User Interface (GUI) shells provide a iconic interface to Unix. GUI shells require the use of workstations (or powerful microcomputers) which perform part of the processing locally. The use of GUIs such as X-Windows is likely to become increasingly important in the near future. GUIs currently available include: Sun View A Sun-specific GUI Open Look GUI standard supported by Sun Motif GUI standard supported by other suppliers Vista eXceed Available on PCs; similar in style to Motif There is a battle currently taking place in the market-place to establish the standard GUI. Recommended shells The Bourne shell is the oldest shell, and is widely used. The C shell has more utilities however and is probably more widely used now. -------------------------------------------------------------------------------- The default shell for interactive shells at Leeds is the C shell. The Bourne shell is the default for shell programs. --------------------------------------------------------------------------------
  58. 58. However the Bourne shell is recommended for shell programs. The Korn shell is not widely available and is not a standard part of Unix, but is perhaps the best option if available, unless you want to do a lot of C programming. You can change your default login shell using the command: % chsh username /bin/sh Bourne shell% chsh username /bin/csh C shell% chsh username /bin/ksh Korn shell Warning! You probably don't want to try these commands now. C shell features The history mechanism The history mechanism enables previous typed Unix commands to be re-invoked and edited. There are two forms. One is the quick substitution, which acts only on the immediately preceding command, e.g: % car message car: Command not found % ^r^t This is the message file This command replaces the first occurrence of 'r' with 't' in the last command. A list of previously entered commands can be displayed using the history command: % history 1 cd texts 2 vi lookup 3 who 4 history Commands can be re-entered using the number. For example: % !2 will re-execute the second command (vi lookup). It is possible to add extra options to commands re-executed. For example to redirect output from the who command to a file called list we could give the command (for the above list): % !3 > list You may also edit previous commands e.g: % !2:s/vi/cat/ cat lookup
  59. 59. although it is usually easier to re-type the whole command. The last command may be referred to as !!, and you can count back using !-2, !-3 etc.. File name completion Within the C shell when a file name is used in a command it is possible to specify only as many characters as will uniquely identify the file, and then press the <ESC> key to complete the filename: % ls mbox message % cat me<ESC> This is the message file When you type <ESC>, the file name will be extended to 'message' on screen. Command aliases Command aliases provide a way of customising commands. For example: % alias dir ls % dir mbox message Note that command aliases are only valid during the execution of the current shell. It is normal practice to include alias definitions in your .cshrc file. The following aliases could be useful to shorten long command names: alias hh history alias ll 'ls -al' alias q logout The quotes around ls -al are necessary because of the space in the command. This tells the shell that it is all one command. -------------------------------------------------------------------------------- PRACTICE -------------------------------------------------------------------------------- Put the above aliases in your .cshrc file. Think of some other aliases that you would use, such as shortened versions of commands or different names for commands that you will find easier to remember.
  60. 60. C shell startup files Certain files are executed automatically. These are: .cshrc file Executed whenever a new C shell spawned Useful for specifying command aliases Since C shells may be spawned automatically be certain systems commands (such as the mail system of a compiler) this file should NOT contain commands which send output to your terminal. Contains a list of directories that are searched for commands. A line in the .cshrc file will give a value to the PATH system variable. The user can add pathnames to this list. It is conventional to store any of your own commands or shell scripts that you will use frequently directory called bin, and to add ^/bin to your search path. .login file Executed when you login. Use for setting system wide variables, such as your terminal type. Can be used to display information, such as who is logged on, or news from the system managers. Shell processes A process is an executing program. To display a list of processes use the ps command: % ps PID TTY TIME COMMAND 23268 ttyp1 0:01 ps 22520 ttyp1 0:00 csh The PID specifies the Process Identifier. The 'time' field gives the amount of CPU used by the process. Background processes Normally processes run interactively, but they may also be run interactively, to enable the user to do something else while a process is running (this is known as 'multitasking'). This is usually necessary when you are running a very long job. To run a command in the background use the & character at the end of the command line, as follows:
  61. 61. % command & Note that output from command will still be sent to standard output. If you fail to redirect standard output it will be sent to your terminal where it is likely to be confused with output from your interactive process. For example, to sort logged on users using a background process give the command: % who | sort > sortedwho & Note that this would normally be a very short process and you would not in fact need to run it in the background. Controlling processes You may wish to terminate a background process. To do this first you must first find out its process id (PID) using ps: % ps PID TTY TIME COMMAND 23397 ttyp1 0:01 who 23268 ttyp1 0:02 ps 22520 ttyp1 0:00 csh Then use the kill command to terminate your process. For example: % kill 23397 If the process continues use the -9 argument: % kill -9 23397 Another way of displaying your background processes is to use the jobs command: % jobs [1] + Running who - sort > sortedwho The background process (or 'job') has been assigned the number 1, and this can be used to refer to it instead of the process i.d.. The job number is usually identified by preceding it with the '%' (per cent) character, so as to differentiate it from a process i.d.. So, for example, the command: % kill %1