Your SlideShare is downloading.
×

×

Saving this for later?
Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.

Text the download link to your phone

Standard text messaging rates apply

Like this document? Why not share!

- Solved problems by Mulugeta Ashango 54907 views
- Probability, Random Variables and S... by CHIH-PEI WEN 2139 views
- Probability and Stochastic Processe... by CHIH-PEI WEN 4794 views
- Applied probability and_stochastic_... by cesaramsantiago 1525 views
- Fundamentals of probability_and_sta... by Estrella_19 3098 views
- Signals and Systems Notes by Akshansh Chaudhary 5265 views

6,028

Published on

Probabilistic Methods of Signal and System Analysis, 3/e stresses the engineering applications of probability theory, presenting the material at a level and in a manner ideally suited to engineering …

Probabilistic Methods of Signal and System Analysis, 3/e stresses the engineering applications of probability theory, presenting the material at a level and in a manner ideally suited to engineering students at the junior or senior level. It is also useful as a review for graduate students and practicing engineers.

Thoroughly revised and updated, this third edition incorporates increased use of the computer in both text examples and selected problems. It utilizes MATLAB as a computational tool and includes new sections relating to Bernoulli trials, correlation of data sets, smoothing of data, computer computation of correlation functions and spectral densities, and computer simulation of systems. All computer examples can be run using the Student Version of MATLAB. Almost all of the examples and many of the problems have been modified or changed entirely, and a number of new problems have been added. A separate appendix discusses and illustrates the application of computers to signal and system analysis

No Downloads

Total Views

6,028

On Slideshare

0

From Embeds

0

Number of Embeds

0

Shares

0

Downloads

459

Comments

0

Likes

2

No embeds

No notes for slide

- 1. Probabilistic Methods ofSignal and System AnalysisTHIRD EDITIONGeorge R. Cooper Clare D. McGillem
- 2. Probabilistic Methods ofSignal and System AnalysisThird Edition
- 3. THE OXFORD SERIES IN ELECTRICAL AND COMPUTER ENGINEERINGSERIES EDITORSAdel S. Sedra, Series Editor, Electrical EngineeringMichael R. Lightner, Series Editor, Computer EngineeringSERIES TITLESAllen and Holberg, CMOS Analog Circuit DesignBobi,:ow, Elementary Linear Circuit Analysis, 2nd Ed.. Bobrow, Fundamentals of Electrical Engineering, 2nd Ed.Campbell, The Science and Engineering of Microelectronic FabricationChen, Analog & Digital Control System DesignChen, Linear System Theory and Design, 3rd Ed.Chen, System and Signal Analysis, 2nd Ed.Comer, Digital Logic and State Machine Design, 3rd Ed.Cooper and McGillem, Probabilistic Methods of Signal and System Analysis, 3rd Ed.Franco, Electric Circuits Fundr;imentalsFortney, Principles Of Electronics: Analog & DigitalGranzow, Digital Transmission LinesGuru and Hiziroglu, Electric Machinery & Transformers, 2nd Ed.Boole and Boole, A Modern Short Course In Engineering ElectromagneticsJones, Introduction to Optical Fiber Communication SystemsKrein, Elements of Power ElectronicsKuo, Digital Control Systems, 3rd Ed.Lathi, Modern Digital and Analog Communications Systems, 3rd Ed.McGillem and Cooper, Continuous and Discrete·Signal and System Analysis, 3rd Ed.Miner, Lines and Electromagnetic Fields for EngineersRoberts and Sedra, SPICE, 2nd Ed.Roulston, An Introduction to the Physics ofSemiconductor DevicesSadiku, Elements of Electromagnetics, 2nd Ed.Santina, Stubberud, and Hostetter, Digital Control System Design, 2nd Ed.Schwarz, Electromagnetics for EngineersSchwarz and Oldham, Electrical Engineering: An Introduction, 2nd Ed.Sedra and Smith, Microelectronic Circuits, 4th Ed.Stefani, Savant, Shahian, and Hostetter, Design of Feedback Control Systems, 3rd Ed.Van Valkenburg, Anµlog Filter DesignWarner and Grung,i Semiconductor Device ElectronicsWolovich, Automatic Control SystemsYariv, Optical Electronics in Modem Communications, 5th Ed.
- 4. CONTENTS ���,Preface xi1 Introduction to Probability 11-1 Engineering Applications of Probability1-2 Random Experiments and Events 51-3 Definitions of Probability 71-4 The Relative-Frequency Approach 81-5 Elementary Set Theory 131-6 The Axiomatic Approach 191-7 Conditional Probability 221-8 Independence 271-9 Combined Experiments 291-10 Bernoulli Trials 311-11 Applications of Bernoulli Trials 35Problems 38References 502 Random Variables sz2-1 Concept of a Random Variable 522-2 Distribution Functions 542-3 Density Functions 572-4 Mean Values and Moments 632-5 The Gaussian Random Variable 672-6 Density Functions Related to Gaussian 772-7 Other Probability Density Functions 872-8 Conditional Probability Distribution andJ)ensity Functions 972-9 Examples and Applications I02
- 5. vi CONTENTSProblems 109References 1193 Several Random Variables 1203-1 Two Random Variables 1203-2 Conditional Probability-Revisited 1243-3 Statistical Independence 1303-4 Correlation between Random Variables 1323-5 Density Function of the Sum of Two Random Variables 1363-6 Probability Density Function of a Function of Two RandomVariables 1423-7 The Characteristic Function 148Problems 152References 1584 Elements of Statistics 1594-1 Introduction 1594-2 Sampling Theory-The Sample Mean 1604-3 Sampling Theory-The Sample Variance 1664-4 Sampling Distributions and Confidence Intervals 1694-5 Hypothesis Testing 1734-6 Curve Fitt_ing and Linear Regression 1774-7 Correlation between Two Sets of Data 182Problems 184References 1885 Random Processes 1895-1 Introduction 1895-2 Continqous and Discrete Random Processes 1915-3 Deterministic and Nondetermipistic Random Processes 1945-4 Stationary and Nonstationary Random Processes 1955-5 Ergodic and Nonergodic Random Processes 1975-6 Measurement of Process Parameters 1995-7 Smoothing Data with a Moving Window Average 203Problems 205References 208
- 6. CONTENTS6 Correlation Functions 2096-1 Introduction 2096-2 Example: Autocorrelation Function of a Binary Process 2136-3 Properties of Autocorrelation Functions 2166-4 Measurement of Autocorrelation Functions 2206.;.5 Examples of Autocorrelation Functions 2276-6 Crosscorrelation Functions 2306-7 Properties of Crosscorrelation Functions 2326-8 Examples and Applications of Crosscorrelation Functions 2346-9 Correlation Matrices for Sampled Functions 240Problems 245References 2567 Spectral Density 2s17-1 Introduction 2577-2 Relation of Spectral Density to the Fourier Transform 2597-3 Properties of Spectral Density 2637-4 Spectral Density and the Complex Frequency Plane 2717-5 Mean-Square Values from Spectral Density 274vii7-6 Relation of Spectral Density to the Autocorrelation Function 2817-7 White Noise 2877-8 Cross-Spectral Density 1897-9 Autocorrelation Function Estimate of Spectral Density 2927-10 Periodogram Estimate of Spectral Density 3017-11 Examples and Applications of Spectral Density 309Problems 315References 3228 Respo,nse of Linear Systems to Random Inputs 3238-1 Introduction 3238-2 Analysis in the Time Domain 3248-3 Mean and Mean-Square Value of System Output 3268-4 Autocorrelation Function of System Output 3308-5 Crosscorrelation between Input and Output 3358-6 Examples of Time-Domain System Analysis 3398-7 Analysis in the Frequency Domain 3458-8 Spectral Density at the System Output 346
- 7. viii CONTE NTS8-9 Cross-Spectral Densities between Input and Output 3508-10 Examples of Frequency-Domain Analysis 3528-11 Nurp.erical Computation of System Output 359Problems 368References 380,9 Optimum Linear Systems 3819-1 Introduction 3819-2 Criteria of Optimality · 3829-3 Restrictions on the Optimum System 3849-4 Optimization by Parameter Adjustment 3859-5 Systems That Maximize Signal-to-Noise Ratio 3959-6 Systems That Minimize Mean-Square Error 402Problems 412References 418AppendicesA Mathematical Tables 4 1 9A-1 Trigonometric Identities 419A-2 Indefinite Integrals 420A-3 Definite Integrals 421A-4 Fourier Transform Operations 422A-5 Fourier Transforms 423A-6 One-Sided Laplace Transforms. 423B Frequently Encountered Probability Distributions 425B-1 Discrete Probability Functions 425B-2 Continuous Distributions 427C Binomial Coefficients 43 1D Normal Probability Distribution Function 432E The Q-Function 434-; Students t Distribution Function 436G Computer Computations 438
- 8. CONTENTSH Table of Correlation Function-SpectralDensity Pairs 466Contour Integration 467Index 475Ix
- 9. PREFACE ------------------------The goals of the Third Edition are essentially the same as those of the earlier editions, viz.,to provide an introduction to the applications ofprobability theory to the solution ofproblemsarising in the analysis of signals and systems that is appropriate for engineering students at thejunior or senior level. However, it may also serve graduate students and engineers as a concisereview of material thatthey previously encountered in widely scattered sources.This edition differs from the first and second in several respects. In this edition use of thecomputer is introduced both in text examples and in selected problems. The computer examplesare carried out using MATLAB1 and the problems are such that they can be handled withthe Student Edition of MATLAB as well as with other computer mathematics applications.In addition.to the introduction of computer usage in solving problems involving statistics andrandom processe�. other changes have also been made. In particular, a number of new sectionshave been added, virtually all of the exercises have been modified or changed, a number oftheproblems have been modified, and a number of new problems have been added.Since this is an engineering text, thetreatment is heuristicratherthanrigorous, and the studentwill find many examples ofthe application ofthese concepts to engineeringproblems. However,it is not completely devoid of the mathematical subtleties, and considerable attention has beendevoted to pointing out some of the difficulties that make a more advanced study of the subjectessential if one is to master it. The authors believe that the educational process is best servedby repeated exposure to difficult subject matter; this text is intended to be the first exposure toprobability and random processes and, we hope, not the last. The book is not comprehensive,but deals selectively with those topics that the authors have found most useful in the solution ofengineering problems.A brief discussion of some ofthe significant features ofthis book will help set the stage fora discussion ofthe various ways it can be used. Elementary concepts of discrete probability areintroduced in Chapter 1: first from the intuitive standpoint of the relative frequency approachand then from the more rigorous standpoint ofaxiomatic probability. Simple examples illustrateall these concepts and are more meaningful to engineers than are the traditional examples ofselecting red and white balls from urns. The concept of a random variable is introduced inChapter 2 along with the ideas of probability distribution and density functions, mean values,and conditional probability. A significant feature of this chapter is an extensive discussion ofMATLAB is the registered trademark of The MathWorks, Inc., Natick, MA.xi
- 10. xii PREFACE.many differentprobability density functions and thephysical situationsinwhich they may occur.Chapter 3 extends the random variable concept to situations involving two or more randomvariables and introduces the concepts of statistical independence and correlation.In Chapter 4, sampling theory, as applied to statistical estimation, is considered in somedetail and a thorough discussion of sample mean and sample varianoe is given. The distributionof the sample is described and the use of confidence intervals in making statistical decisionsis both considered and illustrated by many examples of hypothesis testing. The problem offitting smooth curves to experimental data is analyzed, and the use of linear regression isillustrated by practical examples. The problem of determining the correlation between datasets is examiried.A general discussion ofrandom processes and their classification is given in Chapter 5. Theemphasis here is on selectingprobability models thatareuseful in solving engineering problems.Accordingly, agreatdealofattention is devotedtothephysical significanceofthevarious processclassifications, with no attempt at mathematical rigor. A unique feature ofthis chapter, which iscontinued in subsequent chapters, is an introduction to the practical problem of estimating themean of a random process from an observed sample function. The technique of smoothing datawith a moving window is discussed.Properties and applications of autocorrelation and crosscorrelation functions are discussedin Chapter 6. Many examples are presented in an attempt to develop some insight into thenature of correlation functions. The important problem of estimating autocorrelation functionsis discussed in some detail and illustrated with several computer examples.·Chapter 7 turns to a frequency-domain representation of random processes by introducingthe concept of spectral density. Unlike most texts,which simply define spectral density as theFourier transform of the correlation function, a more fundamental approach is adopted here iriorder to bring out the physical significance of the concept. This chapter is the most difficultone in the book, but the authors believe the material should be presented in this way. Methodsof estimating the spectral density from the autocorrelation function and from the periodogramare developed and illustrated with appropriate computer-based examples. The use of windowfunctions to improve estimates is illustrated as well as the use of the computer to carry outintegration of the spectral density using both the real and complex frequency representations.Chapter 8 utilizes the concepts of correlation functions and spectral density to analyze theresponse of linear systems to random inputs. In a sense, this chapter is a culmination of allthat preceded it, and is particularly significant to engineers who must use these concepts. Itcontains many examples that arerelevant to engineering probiems and emphasizes the need formathematical models that are both realistic and manageable. The comJ.lmtation of system outputthrough simulation is examined and illustrated with computer examples,Chapter 9 extends the concepts of systems analysis to consider systems that are optimum insome sense. Both theClassical matched filter forknown signals and the Wienerfilterfor randomsignals are considered from an elementary standpoint. Computer examples of optimization areconsidered and illustrated with an example of an adaptive filter.Several Appendices are included to provide useful mathematical and statistical tables anddata. Appendix G contains a detailed discussion, with examples, ofthe application ofcomputersto the analysis of signals and systems and can serve as an introduction to some of the waysMATLAB can be used to solve such problems.
- 11. PREFACE xiiiIn a more general-vein, each chapter contains references that the reader may use to extendhis or her knowledge. There is also a wide selection of problems at the end of each chapter. Asolution manual for these problems is available to the instructor.As an additional aid to learning and using the concepts and methods discussed in this text,there are exercises at the end of each major section. The reader should consider these ex0rcisesas part of the reading assignment and should make every effort to solve each one before gomgon to the next section. Answers areprovided so that thereadermay know when his or her effortshave beep successful. It should be noted, however, that the answers to each exercise may notbe listed in the same order as the questions. This is intended to provide an additional challenge.The presence of these exercises should substantially reduce the number of additional problemsthat need to be assigned by the instructor.The material in this text is appropriate for a one-semester, three-credit course offered in thejunior year. Not all sections ofthe text need be used in such a course but 90% ofit can be coveredin reasonable detail. Sections that may be omitted include 3-6, 3-7, 5-7, 6-4, 6-9, 7-9, andpart of Chapter 9; but other choices may be made at the discretion of the instructor. There are,of course, many other ways in which the text material could be utilized. For those schools on a.quarter system, the material noted above could be covered in a four-credit course. Alternatively,if a three-credit course were desired, it is suggested that, in addition to the omissions notedabove, Sections 1-5, 1-6, 1-7, 1-9, 2-6, 3-5, 7-2, 7-8, 7-10, 8-9, and all of Chapter 9 can beomitted if the instructor supplies a few explanatory words to bridge the gaps. Obviously, thereare also many other possibilities that are open to the experienced instructor.It is a pleasure for the authors to acknowledge the very substantial aid and encouragement thatthey have received from their colleagues and students overthe years. In particular, special thanksare due to Prof. David Landgrebe of Purdue Univesity for his helpful suggestions regardingincorporation of computer usage in presenting this material.September 1997George R. CooperClare D. McGillem
- 12. CHAPfER 1Introductionto Probability1-1 Engineering Applications of ProbabilityBefor� embarking on a study ofelementary probability theory, it is essential to motivate such astudy by considering why probability theory is useful in the solution of engineering problems.This can be done in two different ways. The first is to suggest a· viewp�in�. or philosophy,concerning probability that emphasizes its universal physical reality rather than treating it asanother mathematical discipline that may be useful occ;tsionally. The second is to note some ofthe many different types of situations that arise in normal engineering practice in which the useof probability concepts is indispensable.A characteristic feature of probability theory is that it concerns itself with situations thatinvolve uncertainty in some form. The popular conception of this relates probability to suchactivities as tossing -dice, drawing cards, and spinning roulette wheels. Because the rules ofprobability are not widely known, and because such situations can become quite complex, theprevalent attitude is that probability theory is a mysterious and esoteric branch of mathematicsthat is accessible only to trained mathematicians and is of limited value in the real world. Sinceprobability theory does deal with uncertainty, another prevalent attitude is that a probabilistictreatment of physical problems is an_inferior substitute for a more desirable exact analysis andis forced on the analyst by a lack of complete information. Both of these attitudes arefalse.Regarding the alleged difficulty ofprobability theory, it is doubtful there is any otherbranch ofmathematics or analysis that is so completely based on such a small number ofeasily understoodbasic concepts. Subsequent discussion reveals that the major body of probability theory can bededuced from only three axioms that are almost self-evident. Once these axioms and theirapplications are understood, the remaining concepts follow in a logical manner.The attitude that regards probability theory as a substitute for exact analysis stems from thecurrent educational practice ofpresenting physical laws as deterministic, immutable, and strictly
- 13. z CHAPTER 1 · INTRODUCTIONtrue under all circumstances. Thus, a law that describes the response of a dynamic system issupposed to predict that response precisely if the system excitation is known precisely. Forexample, Ohms lawv(t) = Ri (t) (8-1)is assumedtobeexactly trueatevery instantoftime, and, on amacroscopicbasis; this assumptionmay be welljustified. On a microscopic basis, however, this assumption is patently f�se-afactthat is immediately obvious to anyone who has tried to connect a large resistor to the input of ahigh-gain amplifier and listened to the resulting noise..In the light of modem physics and our emerging knowledge of the nature of matter, theviewpoint that natural laws are deterministic and exact is untenable. They are, at best, arepresentation ofthe average behavior ofnature. In many important cases this average behavioris close enough to that actually observed so that the deviations are unimportant. In such cases,the deterministic laws are extremely valuable because they make it possible to predict systembehavior with a minimum ofeffort. In otherequallyimportantcases, therandomdeviationsmaybe significant-perhaps even more significant than the deterministic response. For these cases,analytic methods derived from the concepts ofprobability are essential.From the above discussion, it should be clear that the so-called exact solution is· not exactat all, but, in fact, represents an idealized special case that actually never arises in nature. Theprobabilistic approach, on the other hand, far from being a poor substitute for exactness, isactually the method that most nearly represents physical reality. Furthermore, it includes thedeterministic result as a special case.rt is now appropriate to discuss the types of situations in which probability concepts arise inengineering. The examples presented here emphasize situations that arise in systems studies;butthey do serve to illustrate the essential point that engineering applications ofprobability tendto be the rule rather than the exception.Random Input SignalsFor a physical system to perform a useful task, it is usually necessary that some sort offorcing function (the input signal) be applied to it. Input signals that have simple mathematicalrepresentations are convenient for pedagogical purposes or for certain types of system analysis,but they seldom arise in actual applications. Instead, the input signal is more likely to involvea certain amount of uncertainty and unpredictability that justifies treating it as a randomsignal. There are many examples of this: speech and music signals that serve as inputs tocommunication systems; random digits applied to a computer; random command signals appliedto an aircraft flight control system; random signals derived from measuring some characteristicof a manufactured product, and used as inputs to a process control system; steering wheelmovements in an automobile power-steering system; thesequenc�inwhichthecall andoperatingbuttons ofan elevator are pushed; the numberofvehicles passing various checkpointsin atrafficcontrol system; outside and inside temperature fluctuations as inputs to a building heating andair conditioning system; and many others.
- 14. 1 - 1 ENGINEERING APPLICATIONS OF PROBABILITY 3Random DisturbancesMany systems have unwanted disturbances applied to their input or output in addition to thedesired signals. Such disturbances are almost always random in nature and call .for the use ofprobabilistic methods even ifthe desired signal does not. A few specific cases serve to illustrateseveral different types ofdisturbances. If, for a first example, the output ofa high-gain amplifieris connected to a loudspeaker, one frequently hears a variety of snaps, crackles, and pops. Thisrandomnoise arises fromthermal motion ofthe conductionelectronsinthe amplifierinputcircuitor fromrandomvariations in the number ofelectrons (or holes) passing through the transistors.It is obviousthatonecannothope tocalculatethevalue ofthis noise ateveryinstantoftime si.t).Cethis value represents the combined effects of literally billions of individual moving charges. Itis possible, however, to calculate the average power of this noise, its frequency spectrum, andeven the probability of observing a noise value larger than some specified value. As a practicalmatter, these quantities are more important in determining the quality of the amplifier than is aknowledge ofthe instantaneous waveforms.As a second example, consider a radio or television receiver. In addition to noise generatedwithin the receiver by the mechanisms noted, there is random noise arriving at the antenna.This results from distant electrical storms, manmade disturbances, radiation from space, orthermalradiation from surrounding objects. Hence, even ifperfectreceivers and amplifiers wereavailable, the received signal would be combined with random noise. Again, the calculation ofsuch quantities as average power and frequency spectrum may be more significant th� thedetermination of instantaneous value.A differenttype ofsystem is illustrated by alargeradarantenna, which may bepointed in anydirection by means of an automatic control system. The wind blowing on the antennaproducesrandom forces that must be compensated for by the control system. Since the compensation isnever perfect, there is always some random fluctuation in the antenna direction; it is importantto be able to calculatethe effective value and frequency content of this fluctuation.A still different situation is illustrated by an airplane flying in turbulent air, a ship sailing instormy seas, or an army truck traveling overroughterrain. In all these cases, randori-i,disturbingforces, acting on complex mechanical systems, interfere with the proper control or guidance ofthe system. It is essential to determine how the system responds to these random input signals.. .Random System CharacteristicsThe system itself may have characteristics that are unknown and that vary in a random fashionfrom time to time. Some typical examples are aircraft in which the load (that is, the number ofpassengers or the weight of the cargo) varies from flight to flight; troposcatter communicationsystems in which the path attenuation varies radically from moment to moment; an electricpower system in which the load (that is, the amount of energy being used) fluctuates randomly;and a telephone system in which the number of users changes from instant to instant.Therearealso manyelectronic systems in which theparameters may be random. Forexample,it is customary to specify the properties of many solid-state devices such as diodes, transistors,digital gates, shift registers, flip-flops, etc. by listing a range of values for the more important
- 15. 4 CHAPTER 1 · INTRODUCTIONitems. The actual value ofthe parameters are random quantities that lie somewhere in this rangebut are not known a priori.System ReliabilityAll systems are composed of many -individual elements, and one or more of these elementsmay fail, thus causing the entire system, or part of the system, to fail. The times at whichsuch failures will occur are unknown, but it is often possible to determine the probability offailure for the individual elements and from these to determine the "mean time to failure" forthe system. Such reliability studies are deeply involved with probability and are extremelyimportant in engineering design. As sy.stems become more complex, more costly, and containlarger numbers ofelements, the problems ofreliability become more difficult and take on addedsignificance.Quality ControlAn important method of improving system reliability is to improve the quality ofthe individualelements, and this can often be done by an inspection process. As it may be too costly toinspect every element after every step during its manufacture, it is necessary to develop rulesfor inspecting elements selected at random. These rules are based on probabilistic concepts andserve the valuable purpose of maintaining the quality of the product with the least expense.Information TheoryA major objective ofinformation theory is to provide a quantitative measure for the informationcontent of messages such as printed pages, speech, pictures, graphical data, numerical data, orphysical observations of temjJerature, distance, velocity, radiation intenshy, and rainfall. Thisquantitative measure is necessary to provide communication channels that are both adequateand efficient for conveying this information from one place to another. Since such messagesand observations are almost invariably unknown in advance and random in·nature, they canbe described only in terms of probability. Hence, the appropriate information measure is aprobabilistic one. Furthermore, the communication channels are subject to random distur.Pances(noise) that limit their ability to convey information, and again a probabilistic description isrequired.SimulationIt is frequently useful to investigate system performance by computer simulation. This can oftenbe carried out successfully even when a mathematical analysis is impossible or impractical. Forexample, whenthere arenonlinearities presentin asystemitisoftennotpossibletomake an.exactanalysis. However, it is generally possible to carry out a simulation ifmathematical expressionsforthenonlinearitiesca�be obtained.Wheninputshaveunusual statisticalproperties, simulation
- 16. 1 -2 RAN DOM EXPERIMENTS AND EVENTS 5may be the only way to obtain detailed information about system performance. It is possiblethrough simulation to see the effects ofapplying a wide range ofrandom and nonrandom inputsto a system and to investigate the effects of random variations in component values. Selectionof optimum component values can be made by simulation studies whei;i other methods arenot feasible.It should be clear from the above partial listing that almost any engineering endeavor involvesa degree of uncertainty or randomness that makes the use of probabilistic concepts an essentialtool for the present-day engineer. In the case of system analysis, it is necessary to have somedescription of random signals and disturbances. There are two general methods of describingrandomsignals mathematically. The first, and most basic, is a probabilistic description in whichthe random quantity is characterized by a probability model. This method is discussed later inthis chapter.The probabilistic description of random signals cannot be used directly in system analysissince it indicates very little about how the random signal varies withtime or what its frequencyspectrum is. It does, however, lead to the statisticai description of random signals, which isuseful in system analysis. In this case the random signal is characterized by a statistical model,which consists of an appropriate set of average values such as the mean, variance, correlationfunction, spectral density, and others. These average values represent a less precise descriptionof the random signal than that offered by the probability model, but they are more useful forsystem analysis because they can be computed by using straightforward and relatively simplemethods. Some of the statistical averages are discussed in subsequent chapters.There are many steps that need to be taken before it is possible to apply the probabilistic andstatistical concepts to system analysis. In order that the reader may understand that even themost elementary steps are important to the final objective, it is desirable to outline these Stepsbriefly. The first step is to introduce the concepts of probability by considering discrete randomevents. These concepts are then extended to continuous random variables and subsequently torandom functions oftime. Finally, several ofthe average values associated with random signalsare introduced. At this point, the tools are available to consider ways of analyzing the responseof linear systems to raadom inputs.1 -2 Random Experiments and EventsThe concepts ofexperiment and event are fundamental to an understanding ofelementary probability concepts. An experiment is some action that results in an outcome. A random experimentis one in which the outcome is uncertain before the experiment is performed. Although there is aprecise mathematical definition of a random experiment, a better understanding may be gainedby listing some examples of well-defined random experiments and their possible outcomes.This is done in Table 1-1 . It should be noted, however, that the possible outcomes often maybe defined in several different ways, depending upon the wishes ofthe experimenter. The initialdiscussion is concerned with a single performance of a well-defined experiment. This singleperformance is referred to as a trial.An important concept in connection with random events is that of equally likely events. For.example, ifwe toss a coin we expect that the event ofgetting a head and the eventofgetting a tail
- 17. 6 CHAPTER 1 • INTRODUCTIONare equally likely. Likewise, ifwe roll a die we expectthat the events ofgetting any numberfrom1 to 6 are equally likely. Also, when a card is drawn from a deck, each ofthe 52 cards is equallylikely. Aterm that is often used to be synonymous withthe concept ofequally likely events is thatof selected at random. For example, when we say that a c3!d is selected at random from .a deck,we are implying that all cards in the deck are equally likely to have been chosen. In general, weassume th"at the outcomes ofan experiment areequally likely unless Uiere is someclearphysicalreason why they should not be. In the discussions that follow, there will be examples ofeventsthat are assumed to be equally likely and even.ts that are not assumed to be equally likely. Thereader should clearly understand the physical reasons for the assumptions in both cases.It is also important to distinguish between elementary events and composite events. Anelementary event is one for which there is only one outcome. Examples of elementary eventsinclude such things as �ossing a coin or rolling a die when the events are defined in a specificway. When a coin is tossed, the event of getting a head or the event of getting a tail can beachieved in only one way. Likewise, when a die is rolled the event of getting any integer froml to 6 can be achieved in on!y one way. Hence, in both cases, the defined events are elementaryevents. On the other hand, it is possible to define events associated with rolling a die that are notelementary. For example, let one event be that of obtaining an even number while another eventis that of obtaining .an odd number. In this case, each event can be achieved in three differentways and, hence, these events are composite.Thereare many differentrandom experiments in which the events can be defined to be eitherelementary or composite. For example, when a card is selected at random from a deck of 52cards, there are 52 elementary events corresponding to the selection ofeach ofthe cards. On theother hand, the event ofselecting a heart is a composite event containing 13 different outcomes.Likewise, ·the event of selecting an ace is a composite event containing 4 outcomes. Clearly,there are many other ways in which composite events could be defined.When the number of outcomes of an experiment are countable (that is, they can be put inone-to-one correspondence with the integers), the outcomes are said to be ·discrete. All of theexamples discussed above represent discrete outcomes. However, there are many experimentsin which the outcomes are not countable. For example, ifa random voltage is observed, and theoutcome taken to be the value ofthe voltage, there may be an infinite and noncountable numberofpossible values that can be obtained. In this case, the outcomes are said to form a continuum.Table 1-1 Possible Experiments and Their OutcomesExperimentFlipping a coinThrowing a dieDrawing a cardObserving a voltageObserving a voltageObserving a voltagePossible OutcomesHeads (H), tails (T)1, 2, 3, 4, 5, 6Any of the 52 possible cardsGreater than 0, less than 0Greater than V, less than VBetween V, and V2, not between V, and V2
- 18. 1 - 3 DEFINITIONS OF PROBABI LITY 7The concept of an elementary event does not apply in this case.It is also possible to conduct more complicated experiments with more complicated sets ofevents. The experiment may consist of tossing 10 coins, and it is apparent in this case thatthere are many different possible outcomes, each of which may be an event. Another situation,which has more ofan engineering flavor, is that ofa telephone system having 10,000 telephonesconnected to it. At any given time, a possible event is that 2000 of these telephones are in use.Obviously, there are a great many other possible events.Ifthe outcome ofan experiment is uncertain before the experiment is performed, the possible-outcomes arerandom events. To each ofthese events it is possible to assign a number, called theprobability ofthat event, and this numberis a measure ofhow likely that event is. Usually, thesenumbers are assumed, the assumed values being based on our intuition about the experiment.For example, if we toss a coin, we would expect that the possible outcomes of heads and tailswould be equally likely. Therefore, we would assume the probabilities ofthese two events to bethe same.1 -3 Definitions of ProbabilityOne of the most serious stumbling blocks in the study of elementary probability is that ofarriving at a satisfactory definition of the term "probability." There are, in fact, four or fivedifferent definitions for probability that have been proposed and used with varying degreesof success. They all suffer from deficiencie� in concept or application. Ironically, the mostsuccessful "definition" leaves the term probability undefined.Of the various approaches to probability, the two that appear to be most useful are therelative-frequency approach and the axiomatic approach. The relative-frequency approach isuseful because it attempts to attach some physical significance to the concept of probabilityand, thereby, makes it possible to relate probabilistic concepts to the real world. Hence, theapplication ofprobability to engineering problems is almost always accomplished by invokingthe concepts ofrelative frequency, even when engineers may not be conscious of doing so.The limitation of the relative-frequency approach is the difficulty of using it to deduce theappropriate mathematical structure for situations that are too complicated to be analyzed readilyby physical reasoning, This is not to imply that this approach cannot be used in such situations,for it can, but it does suggest that there may be a mucheasier way to deal withthese cases. Theeasier way turns out to be the axiomatic approach.The axiomatic approach treats the probability of an event as a number that satisfies certainpostulates but is otherwise undefined. Whether or not this number relates to anything in thereal world is of no concern in developing the mathematical structure that evolves from thesepostulates. Engineers may object to this approach as being too artificial and too removed fromreality, but they should remember that the whole body of circuit theory was developed inessentially the same way. In the case of circuit theory, the basic postulates are Kirchhoffs lawsand the conservation of energy. The same mathematical structure emerges regardless of whatphysical quantities are identified with the abstract symbols--or even if no physical quantitiesare associated with them. It is the task of the engineer to relate this mathematical structure to
- 19. 8 CHAPTER 1 · INTRODUCTIONthe real world in a way that is admittedly not exact, but that leads to useful solutions to realproblems.Fromthe abovediscussion, itappears thatthemostuseful approachtoprobabilityforengineersis a two-pronged one, in which the relative-frequency concept is employed to relate simpleresults to physical reality, and the axiomatic approach is employed to develop the appropriatemathematics for more complicated situations. It is this philosophy that is presented here.1-4 The Relative-Frequency ApproachAs its name implies, the relative-frequency approach to probability is closely linked to thefrequency of occurrence of the defined events. For any given event, the frequency of occurrence is used to define a number called the probability of that event and this number is ameasure of how likely that event is. Usually, these numbers are assumed, the assumed valuesbeing based on our intuition about the experiment or on the assumption that the events areequally likely.To make this concept more precise, consider an experimentthat is performedN times and forwhich there are four possible outcomes that are considered to be the elementary events A, B, C,and D. Let NA be the number of times that event A occurs, with a similar notation for the other·events. It is clear thatWe now define the.relative frequency ofA, r (A) asNAr (A) =NFrom (1-1) it is apparent thatr (A) + r(B) + r (C) + r (D) = 1(1-1)(1-2)Now imaginethat N increases withoutlimit. When aphenomenon known as statistical regularityapplies, the relative frequency r (A) tends to stabilize and approach a number, Pr (A), that canbe taken as the probability of the elementary event A. That isPr (A) = lim r(A)N�oo(1-3)From the relation given above, it follows thatPr (A) +Pr (B) + Pr (C) + · · · + Pr (M) = 1 (1-4)and we can conclude that the sum of the probabilities of all of the mutually exclusive eventsassociated with a given experiment must be unity.
- 20. 1 -4 TH E RELATIVE-FREQU ENCY APPROACH 9These concepts can be summarized by the following set of statements:1. 0 :::; Pr (A) ::S 1.2. Pr (A) + Pr (B) + Pr (C) + · · · + Pr- (M) = 1, for a complete set of mutually exclusiveevents.3. An impossible everit is represented by.Pr (A) = 0.4. A certain event is represented by Pr (A) = 1.To make some of these ideas more specific, consider the following hypothetical example.Assume that a large bin contains an assortment of resistors of different sizes, which arethoroughly mixed. In particular; let there be 100 resistors having a marked value of I Q, 500resistors marked 10 Q, 150resistors marked 100Q, and 250resistors marked 1000Q. Someonereaches into the bin and pulls out one resistor at random. There are now four possible outcomescorresponding to the value of the particular resistor selected. To determine the,probability ofeach ofthese events we assume that the probability of each event is proportional to the numberof resistors in the bin corresponding to that event. Since there are 1000 resistors in the bin alltogether, the resulting probabilities are100Pr (1 Q) =lOOO= 0.1150Pr (lOO Q) =lOOO= 0.15500Pr (10Q) =lOOO= 0.5250Pr (1000Q) =lOOO= 0.25Note that these probabilities are all positive, less than 1, and do add up to 1.Many times one is interested in more than one event at a time. If a coin is tossed twice, onemay wish to determine the probability that a head will occur on both tosses. Such a probabilityis referred to as ajoint probability. In this particular case, one assumes that all four possible.outcomes (HH, HT, TH, and IT) are equally likely and, hence, the probability of each is onefourth. In a more general case the situation is not this simple, so it is necessary to look at amore complicated situation in order to deduce the true nature ofjoint probability. The notationemployed is Pr (A , B) and signifies the probability of thejoint occurrence of events A and B.Consider again the bin of resistors and specify that in addition to having different resistancevalues, they also have different power ratings. Let the different power ratings be 1 W, 2W, and5 W; the number having each rating is indicated in Table 1-2.Before using this example to illustrate joint probabilities, consider the probability (nowreferred to as a marginalprobability) ofselecting a resistor having a given powerratingwithoutregard to its resistance value. From the totals given in the right-hand column, it is clear thatthese probabilities are440Pr (1 W) =lOOO =0.44360Pr (5W) =lOOO= 0.36200Pr (2W) =lOOO= 0.20
- 21. 10 CHAPTER 1 · INTRODUCTIONWe now ask what the joint probability is of selecting a resistor of 10 Q having S-W powerrating. Since there are lSO such resistors in the bin, this joint probability is clearlylSOPr (lO Q, SW) =lOOO= 0.lSThe 11 otherjoint probabilities canbe determined in a similar way. Note that some of thejointprobabilities are zero [forexample, Pr (1 Q, SW) = 0] simply because a particular combinationof resistance and power does not exist.It is necessary at this point to relate thejoint probabilities to the marginal probabilities. In theexample of tossing a coin two times, the relationship is simply a product. That is,1 1 1Pr (H, H) = Pr (H) Pr (H) = l x l = 4But this relationship is obviously not true for the resistor bin example. Note thatand it was previously shown thatThus,Pr (SW) =130�0= 0.36Pr (10 Q) = O.SPr (10 Q) Pr (SW) = O.S x 0.36 = 0. 18 ;6 Pr (10 Q, SW) � O. lSand thejoint probability is not the product of the marginal probabilities.To clarify this point, it is necessary to introduce the concept of conditional probability. Thisis the probability of one event A, given that another event B has occurred; it is.designated asPr (AJB). In terms of the resistor bin, consider the conditional probability of selecting a 10-Q resistor when it is already known that the chosen resistor is S W. Since there are 360 S-Wresistors, and I SO of these are 10 Q, the required .conditional probability isTable 1-2 Resistance Values and Power RatingsResistance ValuesPower Rating lU IOU lOOU lOOOU Totaisl W so 300 90 0 4402 W so so 0 100 200S W 0 lSO 60 lSO 360Totals 100 soo lSO 2SO 1000
- 22. 1-4 TH E RELATIVE-FREQU ENCY APPROACHlSOPr (lO QISW) =360= 0.4171 1Nowconsidertheproductofthis conditionalprobability andthemarginal probability ofselectinga S-W resistor.Pr(lO QISW}Pr (SW) = 0.417 x 0.36 = O. lS = Pr (lO Q, SW)It is seen that this product is indeed thejoint probability.The same result can also be obtained another way. Consider the conditional probabilitylSOPr (SWl lO Q) =SOO= 0.30since there are lSO S-W resistors out ofthe SOO 10-Q resistors. Then form the"productPr (SWl lO Q) Pr (lO Q) = 0.30 x O.S = Pr (lO Q, SW)Again, the product is thejoint probability.(l-5)The foregoing ideas concerning joint probability can be summarized in the general equationPr (A, B) = Pr (AI B) Pr (B) = Pr (B IA) Pr (A) (1-6)which indicates that thejoint probability of two events can always be expressed as the productofthe marginal probability ofone event and the conditional probability ofthe other event giventhe first event.We now return to the coin-tossing problem, in which it is indicated that the joint probabilitycan be obtained as the product oftwo marginal probabilities. Under what conditionsWill this betrue? From equation (1-6) it appears that this can 1:>e true ifPr (AIB) = Pr (A) and Pr (BIA) = Pr (B)These statements imply that the probability of event A does not depend upon whether or notevent B has occurred. This is certainly true in coin tossing, since the outcome of the secondtoss cannot be influenced in any way by the outcome of th,e first toss. Such events are said tobe statistically independent. More precisely, two random events are statistically independent ifand only ifPr (A, B) = Pr (A) Pr (B) (1-7)The preceding paragraphs provide a very brief discussion of many of the basic concepts ofdiscrete probability. They have been presented in a heuristic fashion without any attempt tojustify them mathematically. Instead, all of the probabilities have been formulated by invokingthe .concepts of relative frequency and equally likely events in terms of specific numerical
- 23. 12 CHAPTER 1 · I NTRODUCTIONexamples. It is clear from these examples that it is not difficult to assign .reasonable numbersto the probabilities of various events (by employing the relative-frequency approach) when thephysical situation is notvery involved. It should also be apparent, however, that such an approachmight become unmanageable when there are many possible outcomes to any experiment andmany different ways ofdefining events. This is particularly true when one attempts to extend theresults for the discrete case io the continuous case. It becomes necessary, therefore, to reconsiderall of the above ideas in a more precise manner and to introduce a measure of mathematicalrigor that provides a more solid footing for subsequent extensions.Exercise 1-4.1a) A box contains 50 diodes of which 1 0 are known to be bad. A diode isselected at random. What is the probability that it is bad?b) If the first diode drawn from the box was good, what is the probabilitythat a: second diode drawn will be good?c) If two diodes are drawn from the box what is the probability that theyare both good?Answers: 39/49, 1 56/245, 1/5(Note: In the exercise above, and in others throughout the book, answersare not necessarily given in the same order as the questions.)Exercise 1-4.2A telephone switching center survey indicates that one of four calls is abusiness call, that one-tenth of business calls are long distance, and onetwentieth of nonbusiness calls are long distance.a) What is the probability that the next call will be a nonbusiness longdistance call?b) What is the probability that the next call will be a business call given thatit is a long-distance calltc) What is the probability that the next call will be a nonbusiness call giventhat the previous call was��mg distance?Answers 3/80, 3/4, 2/5------------�- ·�-�-=--------------
- 24. 1 - 5 ELEMENTARY SET THEORY 131 -5 Elementary Set TheoryThe more precise formulation mentioned in Section 1-4 is accomplished by putting the l.deasintroduced in that section into the framework ofthe axiomatic approach. To do this, however, itis first necessary to review some of the elementary concepts of set theory.A set is a collection of objects known as elements. It will be designated aswhere the set is A and the elements are a1 , • • • , an . For example, the setA may consist of theintegers from 1 to 6 so that a1 = 1 , a2 = 2, . . . , a6 = 6 are the elements. A subset of Ais any set all of whose elements are also elements of A. B = { 1 , 2, 3 l is a subset of the setA = { 1 , 2, 3, 4, 5, 6}. The general notation for indicating that B is a subset ofA is· B c A. Notethat every set is a subset of itself.All sets of interest in probability theory have elements taken from the largest set called aspace and designated as S. Hence, all sets will be subsets of the space S. The relation of S andits subsets to probability will become clear shortly, but in the meantime, an illustration may behelpful. Suppose that the elements of a space consist of the six faces of a die, and that thesefaces are designated as 1 , 2, . . . , 6. Thus,S = { l , 2, 3 , 4, 5, 6}There are many ways in which subsets might be formed, depending upon the number ofelementsbelonging to each subset. In fact, ifone includes the null set orempty set, which has no elementsin it and is denoted by 0, there are 26 = 64 subsets and they may be denoted as0, { 1 } , . . . {6} , { l , 2} , { 1 , 3}, . . . {5, 6}, { 1 , 2, 3}, . . . shi general, if S contains n elements, then there are 2n subsets. The proof of this is left as anexercise for the student.One of the reasons for using set theory to develop probability concept� is that the importantoperations are already defined for sets and have simple geometric representations that aidin visualizing and understanding these operations. The geometric representation·is the Venndiagram in which the space S is represented by a square and the various sets are representedby closed plane figures. For example, the Venn diagram shown in Figure 1-1 shows that B is asubset ofA and that C is a subset of B (and also ofA). The various operations are now definedand represented by Venn diagramsEqualitySet A equals set B if! (if and only if) every element ofA is an element of B and every elementof B is an element ofA. ThusA = B if! A c B and B c A
- 25. 14 CHAPTER 1 · INTRODUCTIONFigure 1-;-1 · Venn diagram for C c B c A.The Venn diagram is obvious and will not be shown.SumsThesum or union oftwo sets is a set consisting ofall the elements that are elements ofA or ofB or of both. It is designated as A U B. This is shown-in Figure 1-2. Since the associative lawholds, the sum of more than two sets can be written without-parentheses. That is(A U B) U C = A U (B U C) = A U B U CThe commutative law also holds, so thatProductsA U B = B U AA U A = AA U 0 = AA U S = SA U B = A, if B c ATheproduct or intersection oftwo sets !s the set consisting of all the elements that are commonto both sets. It is designated as A n B and is illustrated in Figure 1-3. A number of resultsapparent from the Venn diagram are
- 26. 1 -5 ELEMENTARY SET THEORYfigure 1-2 The sum of two sets, A U B.figure 1-3 The intersection oftwo sets. A n B .A n B = B n A (Commutative law)A n A = AA n 0 = 0A n S = AA n /3 = B, if B c A1 5If there are more than two sets involved in the product, the Venn diagram of Figure 1-4 isappropriate. From this it is seen that(A n B) n c = A n (B n C) = A n B n cA n (B U C) = (A n 8) U (A n C) (Associative law)Two sets A and B are mutually exclusive or disjoint if A n B = 0 . Representations of suchsets in the Venn diagram do not overlap.
- 27. 1 6 CHAPTER 1 · I NTRODUCTIONFigure 14 Intersections fo r three sets.ComplementThe complement ofa set A is a set containing all the elements ofS that are not inA. It is denotedA and is shown in Figure 1-5. It is clear that0 = SS = 0(A) = AA U A = SA n A = 0A c B, if B c AA = B, if A = BTwo additional relations that are usually referred to as DeMorgans laws areDifferences(A U B) = A n B(A n B) = A U BThe difference of two sets, A - B, is a set consisting of the elements ofA that are not in B. Thisis shown in Figure 1-6. The difference may als9 be expressed asA - B = A n B = A - (A n B)
- 28. t -5 ELEMENTARY SET TH EORY 17Figure 1 -5 The complement ofA .Figure 1 -6 The difference o f two sets.The notation (A - B) is often read as "A take away B." The following results are also apparentfrom the Venn diagram:(A - B) U B f. A(A U A) - A = 0A U (A - A) = AA - 0 = AA - S = 0S - A = ANote that when differences are involved, the parentheses cannot be omitted.
- 29. 1 8 CHAPTER 1 · I NTRODl,.JCTIONIt is desirable to illustrate all of the above operations with a specific example. In order to dothis, let the elements of the space S be the integers from 1 to 6, as before:S = {l, 2, 3, 4, 5, 6}and define certain sets asA = {2, 4, 6}, B = {l , 2, 3, 4}, c = {1, 3, 5}From the definitions just presented, it is clear that(A U B) = { 1 , 2, 3, 4, 6}, (B U C) = {1, 2, 3, 4, 5}A U B U C = { 1 , 2, 3, 4, 5, 6} = S = A U CA n B = {2, 4}, B n c = { 1 , 3}, A n c = 0A n B n c = 0, A = {l, 3, 5} = C, B = {5, 6}C = {2, 4, 6} = A, A - B = {6}, B - A = {l, 3}A - C = {2, 4, 6} = A, C - A = {l, 3, 5} = C, B - C = {2, 4}C - B = {5}, (A - B) U B = {1, 2, 3, 4, 6}The.student should verify these results.Exercise 1-5.1If A and B are subsets of the same space, S, finda) (A n B) u (A - B)b) A n (A - B)c) (A n B) n (B u A)Answers: A r1 B, 0, AExercise 1 -5.2Using the algebra of sets show that the following relations are true:
- 30. 1 -6 TH E AXIOMATIC APPROACHa)" A u (A n B) = Ab) A U (A n B) = A u B1 -6 The Axiomatic Approach19Itis now necessary to relate probability theory to the set concepts that havejust been discussed.This relationship is established by defining a probability space whose elements are all theoutcomes (of a possible set of outcomes) from an experiment. For example, if an experimenterchooses to view the six faces of a die as the possible outcomes, then the probability spaceassociated with throwing a die is the sets = {1, �. 3, 4, 5, 6}The various subsets ofS can be identified with the events. For example, in the case ofthrowinga die, the event {2} corresponds to obtaining the outcome 2, while the event {1, 2, 3} correspondsto the outcomes of either 1, or 2, or 3. Since at least one outcome must be obtained on each trial,the space S corresponds to the certain event and the empty set 0 corresponds to the impossibleevent. Any event consisting of a single element is called an elementary event.The next step is to assign to each event a number called, as before, the probability of theevent. If the event is denoted as A, the probability of event A is denoted as Pr (A). This numberis chosen so as to satisfy the following three conditions or axioms:Pr (A) ?: 0Pr (S) = 1If A n B = 0, then Pr (A U B) = Pr (A) + Pr (B)(1-9)(1-10)(1-1 1)The whole body of probability can be deduced from these axioms. It should be emphasized,however, that axioms are postulates and, as such, it is meaningless to try to prove them. Theonly possible test of their validity is whether the resulting theory adequately represents the realworld. The same is true of any physical theory.A large number of corpllaries can be deduced from these axioms and a few are developedhere. First, sinceS n 0 = 0 and S U 0 = Sit follows from (l-11) thatPr (S U 0) = Pr (S) = Pr (S) + Pr (0)
- 31. 20 CHAPTER 1 · INTRODUCTIONHence,Pr (0) = 0 (l-12)Next, sinceA n A = 0 and A U A = Sit also follows from (1-1 1) and (1-10) thatPr (A u A) = Pr (A) + Pr (A) = Pr (S) = 1 (1-13)From this and from.(1-9)Pr (A) = 1 - Pr (A) ::: 1 (1-14)Therefore, the probability of an event must be a number between 0 and 1.If A and B are not mutually exclusive, then (1-1 1) usually does not hold. A more generalresult can be obtained, however. From the Venn diagram of Figure 1-3 it is apparent thatA U B = A u (A U B)and that A and A n B are mutually exclusive. Hence, from (1-1 1) it follows thatPr (A U B) = Pr (A U A n B) = Pr (A) + Pr (A n B)From the same figure it is also apparent thatB = (A n B) u (A n B)and that A n B and A n B are mutually exclusive. Frqm (1-9)Pr (B) = Pr [(A n B) u (A n B)] = Pr (A n B) + Pr (A n B) (1-15)Upon eliminating Pr (A n B), it follows thatPr (A U B) = Pr (A) + Pr (B) - Pr (A n B) ::: Pr (A) + Pr (B) (1-16)which is the desired result.Now thatthe formalism ofthe axiomatic approach has been established, it is desirable to lookat the problem of constructing probability spaces. First considerthe case ofthrowing a single dieand the associated probability space of S = { 1, 2, 3, 4, 5, 6}. The elementary events are simplythe integers associated with the upper face ofthe die and these are clearly mutually exclusive. Ifthe elementary events are assumed to be equally probable, then the probability associated witheach is simply
- 32. 1 - 6 T H E A X I O M ATI C A P P RO A C H1Pr {ct· } = -I6 ct; = 1 . 2, . . . 6Z tNote that this assumption is consistent with the relative-frequency approach, but within theframework of the axiomatic approach it is only an assumption, and. any number of otherassumptions could have been made.For this same probability space, consider the event A = { 1 , 3} = { 1 } U {31. From ( 1 - 1 1)1 1 lPr (A) = Pr { l } + Pr {3} = (5 + (5 = "3and this can be interpreted as the probability ofthrowing either a 1 or a 3. A somewhat morecomplex situation arises when A = { l , 3}, B = {3, 5} and it is desired to determine Pr (A U B).SinceA and B are not mutually exclusive, the resultof( 1 - 1 6) mustl;>e used. From the calculationabove, it is clear that Pr (A) = Pr (B) = �· However, since A n B = {3}, an elementary event,it must be that Pr (A n B) = �· Hence, from ( 1 - 1 6)1 l l lPr (A U B) = Pr (A) + Pr (B) - Pr (A n B) = - + - - - = -3 3 6 2An alternative approach is to note that A U B = { l , 3, 5}, which is composed of three mutuallyexclusive eiementary events. Using (1-11) twice leads immediately to1 l l 1Pr (A U B) = Pr { l } + Pr {3} + Pr {5} = (5 + (5 + (5 = "2Note that this can be interpreted as the probability of either A occurring or B occurring or bothoccurring.Exercise 1-6.1A roulette wheel has 36 slots painted alternately red and black and numberedfrom 1 to 36. A 37th slot is painted green and numbered zero. Bets can bemade in two ways: selecting a number from 1 to 36, which pays 35: 1 if thatnumber wins, or selecting two adjacent numbers, which pays 1 7: 1 if eithernumber wins. Letevent A be the occurrence of the number 1 when the wheelis spun and event B be the occurrence of the number 2.a) Find Pr (A) and the probable return on a $1 bet on this number.b) Find Pr (A u B) and the probable return on a $1 bet on A u B .Answers,: 1 /37, 36/37, 36/37, 2/37
- 33. 22 CHAPTER 1 · I NTRODUCTIONExercise 1-6.2Draw a /13nn diagram showing three subsets that are not mutually exclusive.Using this diagram derive an expression for Pr (A u B u C).Answer: Pr (A) + Pr (B) + Pr (C) - Pr (A n B) - Pr (A n C) - Pr (B n C) +Pr (A n B n C)1 -7 Conditional ProbabilityThe conceptofconditional probability was introducedin Section 1-3 on the basis ofthe relativefrequency of one event when another event is specified to have occurred. In the axiomaticapproach, conditional probability is a defined quantity. If an event B is assumed to have anonzero probability, then the conditional probability of an event A, given B, is defined asPr (AIB) =Pr (A n B)Pr (B)Pr (B) > 0 (1-17)where Pr (A n B) is theprobability oftheevent A n B. Inthepreviousdiscussion, thenumeratorof (1-17) was written as Pr (A, B) and was called thejoint probability of events A and B. Thisinterpretation is still correct ifA and B are elementary events, but in the more general case theproper interpretation must be based on the set theory concept ofthe product, A n B, oftwo sets.Obviously, ifA and B are mutually exclusive, then A n B is the empty set and Pr (A n B) = 0.On the other hand, ifA is contained in B (that is, A c B), then A n B = A andPr (A)Pr (AIB) =Pr (B)..: Pr (A)Finally, if B c A, then A n B = B andPr (B)Pr (AIB) = -- = lPr (B)In general, however, when neither A c B nor B c A, nothing can be asserted regarding therelative magnitudes of Pr (A) and Pr (AIB).So far it has not yet been shown that conditional probabilities are really probabilities in thesense that they satisfy the basic axioms. In the relative-frequency approach they are clearlyprobabilities in that they could be defined as ratios of the numbers of favorable occurrences tothe total number of trials, but in the axiomatic approach conditional probabilities are definedquantities; hence, it is necessary to verify independently their validity as probabilities.The first axiom isPr (AIB) ..: 0
- 34. 1 - 7 CONDITIONAL PROBABILITY 23and this is obviously true from the definition (1-17) since both numerator and denominator arepositive numbers. The second axiom isPr (SIB) = 1and this is also apparent since B c S so that S n B = B and Pr (S n B) = Pr (B). To verifythat the third axiom holds, consider another event, C, such that A n C = 0 (that is, A and Care mutually exclusive). ThenPr [(A u C) n B] = Pr [(A n B) u (C n B)] = Pr (A n B) + Pr (C n B)since (A n B) and (C n B) are also mutually exclusive events and (1-1 1 ) holds for such events.So, from (1-17)Pr [(A U C)IB]_ Pr [(A U C) n B] _ Pr (A n B)+Pr (C n B)Pr (B) Pr (B) Pr (B)= Pr (AIB) + Pr (CIB)Thus the third axiom does hold, and it is now clear that conditional probabilities are validprobabilities in every sense.Before extending the topic ofconditional probabilities, it is desirable to consider an examplein which the events are not elementary events. Let the experiment be the throwing of a singledie and let the outcomes be the integers from 1 to 6. Then define event A as A = { l , 2}, that is,the occurrence of a 1 or a 2. From previous considerations it is clear that Pr (A) = � + � = �Define B as the event of obtaining an even number. That is, B = {2, 4, 6} and Pr (B) = 4since it is composed of three elementary events. The event A n B is A n B = {2}, from whichPr (A n B) = �- The conditional probability, Pr (AIB), is now given byPr (AIB) =Pr (A n B)=1 =�Pr (B) 4 3This indicates that the conditional probability of throwing a 1 or a 2, given that the outcome is. Ieven, 1s 3 .On the other hand, suppose it is desired to find the conditional probability of throwing aneven number given that the outcome was a 1 or a 2. This isPr (B IA) =Pr (A n B)=1Pr (A) �a result that is intuitively correct.-2One of the uses of conditional probability is in the evaluation of total probability. Supposethere are n mutually exclusive events A1, A2, • • • , An and an arbitrary event B as shown in theVenn diagram of Figure 1-7. The events A occupy the entire space, S, so that
- 35. 24 CHAPTER 1 · INTRODUCTIONA1 U A1 U · · · U An = S (1-18)Since A; and Aj (i =fa j) are mutually exclusive,_ it follows that B n A; and B n Aj are alsomutually exclusive. Further,because of (1-18). Hence, from (1-11),(1-19)But from (1-17)Pr (B n A;) = Pr (BIA;)Pr (A;)Substituting into (1-19) yieldsfigure 1-7 Venn diagram for total probability.Table 1-3 Resistance ValuesBin NumbersOhms 1 2 3 4 s 6 TotalIO Q 500 0 200 800 1200 1000 3700IOO Q 300 400 600 200 800 0 2300IOOO Q 200 600 200 600 0 1000 2600Totals 1000 1000 1000 1600 2000 2000 8600
- 36. 1 - 7 CONDITIONAL PROBABILITY 25Pr (.B) = Pr (B I A 1 ) Pr (A1 ) + Pr (B I A2) Pr (A2) + · · · + Pr (B lAn) Pr (An) (1-20)The quantity Pr (B) is the totalprobability and is expressed in (1-20) in terms of its variousconditional probabilities.An example serves to illustrate an application oftotal probability. Consideraresistorcarrouselcontaining six bins. Each bin contains an assortment-of resistors as shown in Table 1-3. If oneof the bins is selected at random,I and a single resistor drawn :rom that bin at random, what isthe probability that the resistor chosen will be 10 Q? The A; events.in (1-20) can be associatedwith the bin chosen so that1Pr (Ai) = - ,· 6i = 1 , 2, 3, 4, 5, 6since it is assumed that the choices of bins are equally likely. The event B is the selection of a10-n resistor and the conditional probabilities can be related to the numbersof such resistors ineach bin. Thus·500 1Pr (B I A 1 ) =l OOO=2200 2Pr (B IA3) =l OOO=101200 6Pr (B lAs) =2000=100Pr (B I A2) =l OOO= 0800 1Pr (B I A4) =1600=21000 1Pr (B I A6) =2000=2Hence, from (1-20) the total probability of selecting a 10-Q resistor is1 1 1 2 1 1 1 6 1 1 1Pr (B) =2x6 + O x6 +10x6 + 2x6 +1.0x6 +2 x 6= 0.3833It is worth noting that the concepts of equally likely events and relative frequency have beenused in assigning values to the conditional probabilities above, but that the basic relationshipsexpressed by (1-20) is derived from the axiomatic approach.The probabilities Pr (A; ) in (1-20) are oftenreferred to as aprioriprobabilitiesbecause theyare the ones that describe the probabilities ofthe events A; beforeany experiment is performed.Afteran experiment is performed, andeventB observed, theprobabilitiesthatdescribetheeventsA; are the conditionalprobabilities Pr (A; I B). These probabilities may be expressed in termsof those already discussed by rewriting (1-17) asPr (A; n B) = Pr(Ai !B) Pr (B) = Pr (B I A; ) Pr (A; )1 The phrase "at random" is usually interpreted to mean "with equal probability."
- 37. 26 CHAPTER t • INTRODUCTIONThe lastformin the above is obtainedby simply interchanging the roles ofB and Ai . The secondequality may now be writtenPr (A- jB) = _Pr_(B_IA_i)_Pr_(_A_i) Pr (B) into which (1-20) may be substituted to yieldPr (B) ;6 0Pr (BIAi) Pr (Ai)Pr (A;.IB) = -------------Pr (B IA.1) Pr (Ai) + · · · + Pr (BIAn) Pr (An)(l-21)(l-22)The conditional probability Pr (A; I B) is often called the a posteriori probability because itapplies after the experiment is performed; and either (1-21) or (1-22) is referred to as Bayestheorem.The a posteriori probability may be illustrated by continuing the example just discussed.Suppose the resistorthat is chosen fromthe carrousel is found to be a 10-Q resistor. What is the .probability that it came from bin three? Since B is still the event of selecting a 10-Q resistor, theconditional probabilities Pr (BlAi) are the same as tabulated before. Furthermore, the aprioriprobabilities are still �· Thus, from (1-21), and the previous evaluation of Pr (B),Pr (A IB) = (-fo) (�) = o.086930.3833This is the probability that the 10-Q resistor, chosen at random, came frombin three.Exercise 1 -7.1Using the data of Table 1 -3, find the probabilities:a) a 1 000-n resistor that is selected came from bin 4.b) a 1 0-Q resistor that is selected came from bin 3.Answers: 0.20000, 0.08696Exercise 1 -7.2 ·A manufacturer of electronic equipment purchases 1 000 ICs from supplierA," 2000 ICs from supplier 8, and 3000 ICs from supplier C. Testing revealsthat the conditional probability of an IC failing during burn-in is, for devicesfrom each of the suppliers
- 38. 1 -8 INDEPENDENCEPr (FIA) = 0.05, Pr (FIB) = 0.10, Pr (FIC) = 0.10The ICs from all suppliers are mixed together and one device is selected atrandom.a) What is the probability that it will fail during. burn-in?b} Given that the device fails, what is the probability that the device camefrom supplier A?Answers: 0.0909 1 , 0.09 1 671 -8 Independence17The concept ofstatistical independence is a very important one in probability. It was introducedin connection with the relative-frequency approach by considering two trials of an experiment,such as tossing a coin, in which it is clear that the second trial cannot depend upon the outcomeof the first trial in any way. Now that a more general formulation of events is available, thisconcept can be extended. The basic definition is·unchanged, however:Two events, A and B, are independent if and only ifPr (A n B) = Pr (A) Pr (B)(1-23)In many physical situations, independence of events is assumedbecause there is no apparentphysical mechanism by which one event can depend upon the other. In other cases, the assumedprobabilities ofthe elementary events lead to independence of other events defined. from these.In such cases, independence may not be obvious, but can be established from (1-23).The concept of independence can also be extended to more than two events. For example,with three events, the conditions for independence arePr (A 1 n A2) = Pr (Ai ) Pr (A2)Pr (A2 n A3) = Pr (A2) Pr (A3)Pr (Ai n A3) = Pr (A 1 ) Pr (A3)Pr (A 1 n A2 n A3) = Pr (A1) Pr (A2) Pr (A3)Note that four conditions must be satisfied, and that pairwise independence is not sufficientfor the entire set of events to ·be mutually independent. In g·eneral, if there are n events, it isnecessary thr.t(1-24)for every set of integers less than or equal to n. This implies that 2n - (n + 1) equations oftheform (1-24) are required to establish the independence of nevents.
- 39. 28 CHAPTER 1 • INTRODUCTIONOne important consequence ofindependence is·a special form of (1-16), which statedPr (A U B) = Pr (A) + Pr (B) - Pr (A n B)IfA and B are independent events, this becomesPr (A U B) = Pr (A) + Pr (B) - Pr (A) Pr (B) (l-25)Another result of independence is(1-26)if A1 , A2, and A3 are all independent. This is nottrue if they are independent only in pairs. Ingeneral, if A1 , A2, • • • , ·An are independent events, then any one of them is independent ofanyevenfformed by sums, products, and complements of the others.Examples of physical situations that illustrate independence are most often associated withtwo or more trials of an experiment. However, for purposes of illustration, consider twoevents associated with a single experiment. Let the experiment be that of rolling a pair ofdice and define event A as that of obtaining a 7 and event B as that of obtaining an 11. Arethese events independent? The answer is that they cannot be independent because they aremutµally exclusive-if one occurs the other one cannot. Mutually exclusive events can neverbe statistically independent.As a second example consider two events that are not mutually exclusive. For the pair ofdiceabove, define event A as that ofobtaining an odd number and event B as that ofobtaining an 11.The event An B is just B since B is a subset ofA. Hence, the Pr (A n B) = Pr (B) = Pr (11) =2/36 = 1/18 since there are two ways an 11 can be obtained (that is, a 5 and a 6or a 6and a 5).Also ·the Pr (A) = � since halfof all outcomes are odd. It follows then thatPr (A n B) = 1/18 :;6 Pr (A) Pr (B) = (1/2) · (l/18) = 1/36Thus, events A and B are not statistically independent. That this must be the case is opvioussince if B occurs then A must also occur, although the converse is not true.It is also possible to define events associated with a single trial that are independent, butthese sets may not represent any physical situation. For example, consider throwing a singledie and define two events as A = { I , 2, 3} and B = {3, 4}. From previous results it is clearthat Pr (A) = � and Pr (B) = 1· The event (A n B) contains a single element {3}; hence,Pr (A n B) = �· Thus; it follows thatI I I IPr (A n B) = - = Pr (A) Pr (B) = - · -= -6 . 2 3 6and events A and B are independent, although the physical significance of this is not intuitivelyclear. The next section considers situations in which there is morethan one experiment, or morethan one trial of a given experiment, and that discussion will help clarify the matter.
- 40. 1 -9 COMBINED EXPERIMENTSExercise 1-8.1A card is selected at random from a standard deck of 52 cards. Let A be theevent of selecting an ace, and let B be the event of selecting a red card. Arethese events statistically independent? Prove your answer.Answer: YesExercise 1-8�2In the switching circuit shown below, the switches are assumed to operaterandomly and independently.�. .,,/a0 c��DThe probabilities of the switches being closed are Pr (A) = 0.1, Pr (B) =Pr (C) = 0.5 and Pr (p) = Q.2. Find the probability that there is a completepath through the circuit.Answer: 0.04001 -9 Combined Experiments29In the discussion of probability presented thus far, the probability space, S, was associated witha single experiment. This concept is too restrictive to deal with many realistic situations, soit is necessary to generalize it somewhat. Consider a situation in which two experiments areperformed. For example, one experiment might be throwing a die and the other ope tossing acoin. It is then desired to find the probability that the outcome· is, say, a "3" on the die and a"tail" on the coin. In other situations the second experiment might be simply a repeated trial ofthe first experiment. The two experiments, taken together, form a combined experiment,. and itis now necessary to find the appropriate probability space for it.Let one experiment have a space S1 and the other experiment a space S2. Designate theelements of S1 as
- 41. 30 C H A PT E R 1 • I NTRO D U CT I O Nand those of S2 asThen form a new space, called the cartesian product space, whose elements are ali the orderedpairs (a1 , /Ji), (a1 , /Ji), . . . , (ai , /3j)• • • . , (an , f3m). Thus, if S1 has n elements and S2 has melements, the cartesian product space has mn elements. The cartesian product space may bedenoted asto distinguish it from the previous product or intersection discussed in Section 1-5.As an illustration of the cartesian product space for combined experiments, consider the dieand the coin discussed above. For the die the space isS1 = {1, 2, 3, 4, 5, 6}while for the coin it isS2 = {H, T}Thus, the cartesian product space has 12 elements and isS = S1 x S2 = {(I , H), (1, T) , (2, H), (2, T) , (3, H), (3, T); (4, H),(4, T), (5, H), (5, T) , (6, H), (6, T)}It is now necessary to define the events ofthe new probability space. IfA1 is a subset consideredto be an event in Si . and A2 is a subset considered to be an event in S2, then A = A1 x A1 is anevent in S. For example, in the above illustration let A1 = {1, 3, 5} and A2. = {H}. The eventA corresponding to these isA = A1 x A1 = {(I , H), (3, H), (5, H)}To specify the probability of event A, it is necessary to consider whether the two experimentsare independent; the only cases discussed here are those in which they are independent. Insuch cases the probability in the product space is simply the products of the probabilities in theoriginal spaces. Thus, if Pr (A1) is the probability of event A1 in space S1 , and Pr (A2)" is theprobability of A1 in space S2, theri the probability of event A in space S is(l-27)
- 42. 1 - 1 0 BERNOULLI TRIALS 3 1This result may be illustrated by daia from the above example. From previous results,Pr (A1) = � + � + � = t when A1 = {1, 3, 5} and Pr (A2) = t when Az = {H}. Thus,the probability of getting an oddnumberon the die and a head on the coin isPr (A) = (�)(�) = �It is possible to generalize the above ideas in a straightforward manner to situations in whichthere are more than two experiments. However, this will be done only for the more specializedsituation ofrepeating the same experiment an arbitrary number of times.Exercise 1 -9.1A combined experiment is performed by flipping a coin three times. Theelements of the product space are HHH, HHT, HTH, etc.a) Write all the elements of the cartesian product space.b) Find the probability of obtaining exactly one head.c) Find the probability of obtaining at least two tails.Answers: 1/2, 1/4Exercise 1 -9.2A combined experiment is performed in which two coins are flipped and asingle die is rolled. The outcomes from flipping the coins are taken to he HH,TT, and HT (which is taken to be a single outcome regardless of which coinis heads and which coin is tails). The outcomes from rolling the die are theintegers from one to six.·a) Write all the elements in the cartesian product space.b) Let A be the event of obtaining two heads and a number of 3 or less.Find the probability of A.Answer: 1/81 -1 0 Bernoulli TrialsThe situation considered here is one in which the same experiment is repeated n times and itis desired to find the probability that a particular event occurs exactly k of these times. For
- 43. 32 C H A PTER 1 • I NTRO D U CT I O N. example, what is the probability that exactly four heads will be observed when a coin is tossed10 times? Such repeated experiments are referred to as Bernoullitrials.Consider some experiment for which the event A has a probability Pr (A) :::: p. Hence, theprobability that the event does not occur is Pr (A) = q, where p + q = 1.2 Then repeat thisexperiment n times and assume that the trials are independent; that is, that the outcome of anyone trial does not depend in any way upon the outcomes of any previous (or future) trials. Nextdetermine the probability that event A occurs exactly k times in some specific brder, say inthe first k trials and none thereafter. Because the trials are independent, the probability of thisevent isPr (A) Pr (A) · · · Pr (A) Pr (A) Pr (A) . . . Pr (A) = pkqn�kk of these n-k of theseHowever, there are many other ways in which exactly k events could occur because they can arisein any order. Furthermore, because of the independence, all of these other orders have exactlythe same probability as the one specified above. Hence, the event that A occurs k times in anyorder is the sum of the mutually exclusive events that A occurs k times in some specific order,and thus, the probability that A occurs k times is simply the above probability for a particularorder multiplied by the number of different orders that can occur.It is necessary to digress at this point and briefly discuss the theory of combinations in orderto be able to determine the number of different orders in which the event A can occur exactlyk times in n trials. It is apparent that when one forms a sequence of length n, the first A can goin any one of the nplaces, the second A can go into any one of the remaining n - l places, andso on, leaving n - k + 1 places for the kth A. Thus, the total number of different sequences oflength ncontaining exactly k As is simply the product of these various possibilities. Thus, sincethe k ! orders of the k event places are identicali n !k [n(n - l)(n - 2) . . . (n - k + 1)] = 1 _k1. k.(n ). (1-28)The quantity on the right is simply the binomialcoefficient, which is usually denoted either asnCk or as (�) .3 The latter notation is employed here.As an example of binomial coefficients, let n = 4 and k = 2. Then(n) - � -6k -2!2!-and there are six different sequences in which the event A occurs exactly twice. These can beenumerated easily asAAAA, AAAA, AAAA , AAAA , AAAA, AAAA2Tue only justification for changing the notation from Pr (A) to p and from Pr (A) to q is that the p and qnotation is traditional in discussing Bernoulli trials and most of the literature uses it.3A table of binomial coefficients is given in Appendix C.
- 44. 1 - 1 0 BERNOULLI TRIALS 33It is now possible to write the desired probability ofA occurring k times asPn (k) = Pr {A occurs k times} = G)p�qn-k (1 -29)As an illustration ofa possible application ofthis result, consider a digital computer in whichthebinary digits (0 or 1) areorganized into "words" of 32 digits each. If there is a probability of10-3 that any one binary digit is incorrectly read, what is the probability that there is one errorin � entire word? For this case, n = 32, k = 1,and p= 10-3.Hence,Pr {one error in a word} = p32(1)= (312)(10-3)1(0.999)31= 32(0.999)31(10-3):::::0.031It is �lso possible to use (1-29) to find the probability that there will be no errorin a word. Forthis, k = 0 and (�) = 1. Thus,Pr {no error in a word} = p32(0)= (3;)(10-3)0(0.999)32= (0.999)32:::::0.9685There are many other practical applications of Bernoulli trials. For example, if a system has ncomponents and there is a probability pthat any one of them will fail, the probability that oneand only one component will fail isPr {one failure} = Pn Cl) = G)pq<n-I)In some cases, one may be interested in determining the probability that event A occurs atleast k times, or the probability that it occurs no more than k times. These probabilities may beobtained by simply adding the probabilities of all the outcomes that are included in the desiredevent. For example, if a coin is tossed four times, what is the probability of obtaining at leasttwo heads? For this case, p= q == ! and n = 4. From (1-29)the probability of getting twoheads (that is, k = 2)isp4 (2) =G) (�Y(�Y= (6)(�) (�) =�Similarl), the prnbabiity of three heads is
- 45. 34 CHAPTER 1 · INTRODUCTIONand the probability offour heads isHence, the probability of getting at least two heads is3 1 1 1 1Pr {at least two heads} = p4(2) + p4(3) + p4(4) = 8 + 4 +1 6=16The general formulation of problems of this kind can be expressed quite easily, but there areseveral different situations that arise. These may be tabulated as follows:k-1Pr {A occurs lessthan k times in n trials} = LPn (i)i=OnPr {A occurs morethan k times in n trials} = L Pn (i)i=k+IkPr {A occurs nomorethan k times in n trials} = LPn (i)i=OnPr {A occurs at leastk times in n trials} = LPn (i)i=kA final comment in regard to Bernoulli trials has to do with evaluating Pn (k) when n is large.Since the binomial coefficients and the large powers of p and q become difficult to evaluatein such cases, often it is necessary to seek simpler, but approximate, ways of carrying out thecalculation. One such approximation, known as the DeMoivre-Laplace theorem, is1 useful ifnpq » 1 and if lk - npl is on the order of or less than ,.Jrliiii. This approximation isPn (k) = (n )pkqn-k �1e-<k-np)2/2npqk J2nnpq (1-30)The DeMoivre-Laplace theorem has additional significance when continuous probability isconsidered in a subsequent chapter. However, a simple illustration of its utility in discreteprobability is worthwhil�. · Suppose a coin is tossed 100 times and it is desired to find theprobability of k heads, where k is in the vicinity of 50. Since p = q = � and n = 100, (1-30)yieldsPn (k) � _l_e-<k-50)2/50J50;for k values ranging (roughly) from 40 to 60. This is obviously much easier to evaluate thantrying to find the binomial coefficient (�0) for the same range of k values.
- 46. 1,- 1 1 APPLICATIONS OF BERNOULLI TRIALS1 .-:1 1 Applications of Bernoulli Trials35Because of the extensive use of Bernoulli trials in many engineering applications it is usefulto examine a few of these applications is more detail. Three such applications are consideredhere. The first application pertains to digital communication systems in which special typesof coding are used in order to reduce errors in the received signal. This is usually referred toas error-correction coding. The second considers a radar system that employs a type of targetdetection known as binary integration or double thresholddetection. Finally, the third exampleis one that arises in connection· with system reliability.Digital communication systems transmit messages that have been converted into sequencesof binary digits (bits) that have values of either 0 or 1. For practical implementation reasons itis convenient to separate these sequences into blocks, each containing the same number ofbits.Eachblock is usually referred to as a word.Any transmitted word is received correctly only if all the bits in that word are detectedcorrectly. Because of noise, interference, or multipath in the communication channel, one·ormoreofthe bits in any given word may be received incorrectly and, thus, suggest thata differentword was transmitted. To avoid errors of this type it is common to increase the length of theword by adding additional bits (known as check digits) that are uniquely related to the actualmessage bits. Appropriate processing at the receiver then makes it possible to correctly decodethe word provided that the number of bits received in error is not greater than some specifiedvalue. For example, a double-error-correcting code will produce the correct message word if nomore than two bits are received in error in each code word.To illustrate the effectiveness of such an approach, assume that each message word containsfivebits and istransmitted, without error-correction coding, in achannel in which the probabilityof any one bit being received in error is 0.01. Because there is no error-correction coding, theprobability that a given word is received correctly isjust the probability that nobits are receivedin error. The probability of this event, from (1-29), isPr (Correct Word) = ps (O) = (�)(0.01)0(1 - 0.01)5 = 0.951Next assume that a double-error-correction code exists in which the 5 check digits are addedto the 5 message digits so that each transmitted word is now i0 bits long. The message wordwill be correctly decoded now if there are no bits received in error, one bit received in error, ortwo bits received in error. The sum of the probabilities of these three events is the probabilitythat a given message word is correctly decoded. Hence,Pr (Correct Word) = (�)(0.01)0(1 - 0.01)10 + (0)(0.01)1(1 - 0.01)9+ (12°)(0.01)2(1 - 0.01)8 = 0.9999Itisclearthattheprobability ofcorrectly receiving this messagewordhasbeengreatly increased.
- 47. 36 CHAPTER 1 · INTRODUCTIONA radar system transmits short pulses of RF energy and receives the reflected pulses, alongwith noise, in a suitable receiver. To improve the probability of detecting the reflected pulses, itis customary to base the detection on a number ofpulses ratherthanjust one. Although there areoptimum techniques for processing such a sequence of received pulses, a simple suboptimumtechnique involves the use of two thresholds. If the received signal pulse, or noise, or both,exceed the first threshold, the observation is declared to result in a 1. If the first threshold is notexceeded, the observation is declared to result in a 0. After nobservations (i.e., Bernoulli trials),ifthe number of ls is equal to or greater than some value m S n, a detection is declared. Thevalue ofmis the second threshold and is selected on the basis of some criterion ofperformance.Because we are adding l s and Os, this procedure is referred to as binary integration.The two aspects of performance that are usually of greatest importance are the probabilityofdetection and the probability offalse alarm. The probability of detection is the probabilitythat a real target will actually be detected and is desired to be as close to one as possible. Theprobability of false alarm is the probability that a detection_will be declared when there is onlynoise into the receiver and is desired to be as close to zero as possible. Using the results in theprevious section, the probability of detection can be written asPr (Detection) = t G)p;(I.- Ps)n-kk=mwhere Ps is the probability that any one signal pulse will exceed the first threshold. Similarly,the probability of false alarm becomesPr (False alarm) = t G)P!(1 - Pn)n-kk=mwhere Pn is theprobability thatnoisealonewillexceedthethresholdin anyoneobservation. Notethat these two expressions are the same except for the value of the first threshold probabilitiesthat are used.To illustrate this technique, assume that Ps = 0.4 and Pn = 0. 1. (Methods for determiningthese values are considered in subsequent chapters.) Althoughthereare methods fordeterminingthe best value ofmto use for any given value ofn,arbitrarily select mto be the nearest integer ton/4.The resulting probabilities ofdetection and false alarm are shown in Figure 1-8 as afunctionof n, the number of Bernoulli trials. (The ragged nature of these curves is a consequence ofrequiring m to be an integer.) Note that the probability ofdetection increas�s and the probabilityof·false alarm decreases as the number pulses integrated, n, is made larger. Thus, larger nimproves the radar performance. The disadvantage of this, of course, is that it takes longer tomake a detection.The third application of Bernoulli trials to be discussed involves the use of redundancy toimprove system reliability. Components in a complex and expensive system that are essentialto its operation, and difficult or impossible to replace, are often replicated in the system so thatif one component fails another one may continue to function. A good example of this. is foundin communication satellites, in which each satellite carries a number of amplifiers that can be
- 48. 1 - 1 1 APPLICATIONS OF BERNOULLI TRIALSfigure 1-8 Result of binaryintegration in a radar system.8100.099.899.S99.09S.O90.0;; 70.0� so.o:g 30.0-e"" 10.0s.o1.0o.s0.2. IV-,II V.A /o.11. , v v,, -.-II- A; I A _yM"V..l.. -£ -· -V v Pr (detec:lion).,. - Pr (false alarm)�� �- -Il • "" -- - 1- ..,"""""37,,...,-- -0 10 20 30 40 so 60 70 80 90 100NumbefofBernoulli Tiialsswitched into various configurations as required. These amplifiers are usually traveling wavetubes (TWT) at frequencies above 6 GHz, although solid-state amplifiers are sometimes used atlower frequencies. As amplifiers die through the years, the amount oftraffic that can be carriedby the satellite is reduced untilthere is at last no useful transmission capability. Clearly, replacingdead amplifiers in a satellite is not an easy task.Toillustrate·howredundancy canextendtheusefullife ofthe communication satellite, assumethat a given satellite contains 24 amplifiers with 12 being used for transmission in one directionand 12 fortransmission in thereverse direction, andtheyarealways used inpairs to accommodatetwo-way traffic on every channel. Assume further that the probability that any one amplifier willfail within the first 5 years is 0.6, and that the two amplifiers that make up a pai.r;.are alwaysthe same. Hence, the probability that both amplifiers in a given pair are still functioning after 5years isPr (Good Pair) = (1- 0.6)2 = 0. 16The probability that one or more of the 12 amplifier pairs are still functioning after 5 yearsis simply 1 minus the probability that all pairs have failed. From the previous equation, theprobability that any one pair has failed is 0.84. Thus;Pr (One or More Good Pairs) = 1- 0.8412 = 0.877This result assumes that the two amplifiers that make up a pair are always the same and thatit is not possible to switch amplifiers to make pairs with different combinations. In actuality,such switching is possible so that the last good pair of amplifiers can be any two ofthe original24 amplifiers. Now the probability that there are one or more good pairs is simply 1 minus theprobability that exactly 22 amplifiers have failed. This isPr (One or More Good Pairs) = 1-G�)o.622(1-0.6)2= 0.999
- 49. 38 CHAPTER 1 · I NTRODUCTIONNotice the significant improvement in reliability that has resulted from adding the amplifierswitching capability to the communications satellite. Note also that the above calculation ismuch easier than trying to calculate the probability that two or more amplifiers are good.Exercise 1 -1 0.1A pair of dice are tossed 10 times.a) Find the probability that a 6 will occur exactly 4 times.b) Find the probability that an 1 O will occur 2 times.c) Find the probability that a 1 2 will occur more than once.Hint: Subtract the probability of a 1 2 occurring once or not at all from1 .0.Answers: 0. 1 558, 0.0299, 0.0430Exercise 1 -1 0.2A manufacturer of electronic equipment buys 1 000 ICs for which the probability of one IC being bad is 0.01 . Using the DeMoivre-Laplace theoremdeterminea) What is the probability that exactly 1 O of the ICs are bad?b) What is the probability that none of the ICs is bad?c) What is the probability that exactly one of the ICs is bad?Answers: ·o. 1 268, 4.36 x 1 0-4, 4.32 x 1 0-sPROBLEMS ------------------------------------------Notethatthefirsttwodigitsofeachproblemnumbercorrespondtothesection numberin which theappropriatematerialisdiscussed.1-1 . 1 A six-cell storage battery having a nominal terminal voltage of 1 2 V is connected inseries·with an ammeter and a resistor labeled 6 Q.
- 50. PROBLEMS 39a) List as many random quantities as you can for this circuit.b) If the battery voltage can have _any value between 10.5 and 12.5, the resistor canhave any value within 5% of its marked value, and the ammeter reads within 2%of the true current, find the range of possible ammeter readings. Neglect ammeterresistance.c) List any nonrandom quantities you can for this circuit.1-1 .2 In determining the probability characteristics of printed English, it is common toconsider a 27-letter alphabet in which the space between words is counted as a letter.Punctuation is usually ignored.a) Count the number of times each of the 27 letters appears in this problem.b) On·the basis of this count, deduce the most probable letter, the next most probableletter, and the least probable letter (or letters).1-2. 1 For each of the following random experiments, list all of the possible outcomes andstate whether these outcomes are equally likely.a) Flipping two coins.b) Observingthelastdigitofatelephone number selectedatrandomfromthedirectory.c) Observing the sum of the last two digits of a telephone number selected at randomfrom the directory.1-2.2 State whether each of the following defined events is an elementary event.a) Obtaining a seven when a pair of dice are rolled.b) Obtaining two heads when three ct>ins are flipped.c) Obtaining an ace when a card is selected at random from a deck of cards.d) Obtaining a two of spades when a card is selected at random from a deck of cards.e) Obtaining a two when a pair ofdice are rolled.f) Obtaining three heads when three coins are flipped.g) Observing a value less than ten when a random voltage is observed.
- 51. 40 CHAPTER 1 • INTRODUCTIONh) Observing the letter e sixteen times in a piece oftext.1-4. 1 If a die is rolled, determine the probability of each of the following events..a) Obtaining the number 5.b) Obtaining a number greater than 3.c) Obtaining an even number.1-4.2 If a pair of dice are rolled, determine the probability of each of the following events.a) Obtaining a sum of 11.b) Obtaining a sum less than 5.c) Obtaining a sum that is an even number.1-4.3 A box of unmarked ICs contains 200 hex inverters, 100 dual 4-input positive-ANDgates, 50 dual J-K flip flops, 25 decade counters, and 25 4-bit shift registers.a) If an IC is selected at random, what is the probability that it is a dual J-K flip flop?b) What is the probability that an IC selected at random is not a hex inverter?c) Ifthe first IC selected is found to be a 4-bit shift register, what is the probability thatthe second IC selected will also be a 4-bit shift register?1-4.4 In the IC box ofProblem 1-4.3 it is known that 10% ofthe hex inverters are bad, 15%ofthe dual 4-input positive-AND gates are bad, 18% ofthe dual J-K flip flops are bad,and 20% of the decade counters and 4-bit shift registers are bad.a) If an IC is selected at random, what is the probability thatit is both adecadecounterand good?b) If an IC is selected atrandom and found to be a J-K flip flop, what is the probabilitythat it is good?c) If an IC is selected at random and found to be good, what is the probability that itis a decade counter?1-4.5 A company manufactures small electric motors having horse power ratings of 0.1,0.5, or 1.0 horsepower and designed for operation with 120 V single-phase ac, 240 Vsingle-phase ac, or 240 V three-phase ac. The motor types can be distinguished only
- 52. PROBLEMS 41by their nameplates. A distributor has on hand 3000 motors in the quantities shown inthe table below.Horsepower 120 V ac 240 V ac 240 v 300.10.51.09002001004005002000100600One motor is discovered without a nameplate.. For this motor determine the probabilityof each of the following events.·a) The motor has a horsepower rating of 0.5 hp.b) The motor is designed for 240 V single-phase operation.c) The motor is 1.0 hp and is designed for 240 V three-phase operation.d) The motor is 0.1 hp and is designed for 120 V operation.1-4.6 In Problem 1-4.5, assume that 10% of the motors labeled 120 V single-phase aremismarked and that 5% of the motors marked 240 V single-phase are mismarked.a) If a motor is selected at random, what is the probability that it is mismarked?b) If a motor is picked at random from those marked 240 V single-phase, what is theprobability that it is mismarked?c) What is the probability that a motor selected at random is 0.5 hp and mismarked?1-4.7 A box contains 25 transistors, of which 4 are known to be bad. A transistor is selectedat random and tested.a) What is the probability that it is bad?b) Ifthe first transistor tests bad what is the probability that a second transistor selectedat random will also be bad?c) Ifthe first transistor tested is good, what is the probability that the second transistorselected at random will be bad?1-4.8 A traffic survey on a busy highway reveals that one of every four vehicles is a truck.This survey also established that one-eighth of all automobiles are unsafe to drive andone-twentieth of all trucks are unsafe to drive.
- 53. 42 CHAPTER 1 • I NTRODUCTIONa) What is the probability that the next vehicle to pass a given point is an unsafe truck?b) What is the probability that the next vehicle will be a truck, given that it is unsafe?c) What is the probability that the next vehicle that passes a given point wilfbe a truck,given that the previous vehicle was an automobile?1-5. 1 Prove that a space S containing n elements has 2n subsets. Hint: Use the binomialexpansion for (1 + x)n .1-5.2 A space S is defined ass = { l , 3, 5, 7, 9, 11}and three subsets asA = { l , 3, 5}, B = {7, 9, 11}, C = { l , 3, 9, 1 1 }Find:A U B A n s n c (B n C)B U C A A - CA U C B C - AA n B c A - BA n c A n B (A - B) U BB n c A n B (A - B) U C1-5.3 Draw and label the Venn diagram for Problem 1-4.4.1�A Using the algebra of sets show that the following relations are true.a) A U (A n B) = Ab) A u (B n C) = (A u B) n (A u C)c) A u (A n B� = A u Bd) (A n B) u (A n B) u (A n B) = A1-5.5 IfA and B are subsets in he same space S, finda) (A - B) n (B - A)
- 54. PROBLEMS 43b) (A - B) n Bc) (A - B) u (A n B)1-5.6 A space S = {a, b, c, d, e, f} has two subsets defined as A = {a, c, e} and B ={c, d, e, f}. Finda) A U B d) A n Bb) A n B e) A n Bc) (A - B) f) (B - A) U A1-6.1 For the space and subspaces defined in Problem 1-5.2, assume that each element hasa probability of 1/6. Find.the following probabilities.a) Pr (A) b) Pr (B) c) Pr (C)d) Pr (A U B) e) Pr (A U C) f) Pr [(A - C) U B]1 -6.2 A card is drawn at random from a standard deck of 52 cards. Let A be the event that aking is drawn, B the event that a spade is drawn, and C the event that a ten of spadesis drawn. Describe each of the events listed below and calculate its probability.a) A U B b) A n B c) A U Bd) A U C e) B U C f) A n Cg) B n c h) (A n B) u c i) A n B n c1-6.3 An experiment consists of randomly drawing three cards in succession from a standarddeck of 52 cards. Let A be the event of a king on the first draw, B the event of a kingon the second draw, and C the event of a king on the third draw. Describe each of theevents listed below and calculate its probability.a) A n B b) A U B .c) A U Bd) A n li n c e) (A n B) u (B n C) f) A U B U C1 -6A Prove that Pr (A u B) = I - Pr (A n B).1 -6.5 Two solid-state diodes are connected in series. Each diode has a probability of0.05 thatit will fail as a short circuit and a probability of 0.1 that it will fail as an open circuit. If
- 55. 44 C H A PT E R 1 · I NT RO D U CT I O Nthe diodes are independent, what is the probability that the series connection ofdiodeswill function as a diode?1-6.6 A dodecahedron is a solid with 12 sides and is often used to dis.play the 12 months ofthe year. When this object is rolled, let the outcome be taken as the month appearingon the upper face. Also let A = {January}, B = {Any month with 31 days}, andC = {any month With 30 days}. Finda) Pr (A U B) b) Pr (A n B) c) Pr (C U B) d) Pr (A n C)1-7. 1 In a digital communication system, messages are encoded into the binary symbols O.and 1. Because of noise ·in the system, the incorrect symbol is sometimes received.Suppose that the probability of a 0 being transmitted is 0.4 and the probability of a Ibeing transmitted is 0.6. Further suppose that the probability of a transmitted 0 beingreceived as a 1 is 0.08 and the probability of a transmitted 1 being received as a 0 is0.05. Find:a) The probability that a received 0 was transmitted as a 0.b) The probability that a received 1 was transmitted as a 1.c) The probability that any symbol is received in error.1-7.2 A certain typist sometimes makes mistakes by hitting a key to the right or left of theintended key, each with a probability of 0.02. The letters E, R, and T are adjacentto one another on the standard QWERTY keyboard, and in English they occur withprobabilities of Pr (£) = 0. 1031, Pr (R) = 0.0484, and Pr (T) = 0.0796.a) What is the probability with which the letter R appears in text typed by this typist?· b) What is the probability that a letter R appearing in text typed by this typist will bein error?1-7.3 A candy machine has 10 buttons of which one never works, two work one-half thetime, and the rest work all the time. A coin is inserted and a button is pushed at random.a) What is the probability that no candy is received?b) If no candy is received, what is the probability thatthe button that never works wasthe one pushed?c) Ifcandy is received, what is theprobabilitythatone ofthebuttonsthatworkone-halfthe time was the one pushed?
- 56. PROBLEMS 451-7.4 A fair coin is tossed. Ifit comes.up heads, a single die is rolled. Ifit comes up tails, twodice are rolled. Given that the outcome of the dice is 3, but you do not know whetherone or two dice were rolled, what is the probability that the coin came up heads?1-7.5 A communication network has five links as shown below.The probability that each link is working is 0.9. What is the probability of being ableto transmit a message from point A to point B?1-7.6 A manufacturer buys components in equal amounts from three different suppliers. Theprobability that components from supplier A are bad is 0.1, that components fromsupplier B are bad is 0.15, and that components from supplier C are bad is 0.05. Finda) The probability that a component selected at random will be bad.b) Ifa component is found to be bad, what is the probability that it came from supplierB?t-7.7 An electronics hobbyist has three electronic parts cabinets with two drawers each.One cabinet has NPN transistors in each drawer, while a second cabinet has PNPtransistors in each drawer. The third cabinet has NPN transistors in one drawer andPNP transistors in the other drawer. The hobbyist selects one cabinet at random andwithdraws a transistor from one of the drawers.a) What is the probability that an NPN transistor will be selected?b) Given that the hobbyist selects an NPN transistor, what is the probability that itcame from the cabinet that contains both types?c) Given that an NPN transistor is selected what is the probability that it comes fromthe cabinet that contains only NPN transistors?
- 57. 46 CHAPTER 1 • INTRODUCTION1-7.8 If the Pr (A) > Pr (B), show that Pr (A I B) > Pr (B I A).1-8. 1 When a pair of dice are rolled, let A be the event of obtaining a number of 6 or greaterand let B be the event of obtaining a number of 6 or less. Are events A and B dependentor independent?1-8.2 If A, B, and C are independent events, prove that the following are also independent:a) A and B U C.b) A and B n C.c) A and B - C.1-8.3 A pair ofdice are rolled. Let A be the event ofobtaining an odd number on the first dieand B be the event of obtaining and odd number on the second die. Let C be the eventof obtaining an odd total from both dice.a) Show thatA and B are independent, thatA and C are independent, and that B and Care independent.b) Show that A, B, and C are not mutually independent.1-8.4 IfA is independent of B, prove that:a) A is independent of B.b) A is independent of B.1-9•1 A combined experiment is performed by rolling a die with sides numbered from 1 to6 and a childs block with sides labeled A through F.a) Write all of the elements of the cartesian product space.b) Define K as the event of obtaining an even number on the die and a letter of B orCon the block and find the probability of the event K.1-9.2 An electronic manufacturer uses four different types of ICs in manufacturing aparticular device. The NAND gates (designated as G if good and G if bad) have aprobability of 0.05 of being bad. The flip flops (Fand F) have a probability of 0.1 ofbeing bad, the counters (C and C) have a probability of 0.03 of being bad, and the shiftregisters (S and S) have a probability of 0.12 of being bad.a) Write all of the elements in the product space.
- 58. PROBLEMS 47b) Determine the probability that the manufactured device will work.c) If a particular device does not work, determine the probability that only the flip flopsare bad.d) If a particular device does not work, determine the probability thatboth the flip flopsand the counters are bad.1-9.3 A combined experiment is performed by flipping a coin three times.a) Write all of the elements in the product space by indicating them as HHH, HTH,etc.b) find the probability of obtaining exactly two heads.c) Find the probability of obtaining more than one head.1-10.1 Two men each flip a coin three times.a) What is the probability that both men will get exactly two heads each?b) What is the probability that one man will get no heads .and the other man will getthree heads?1-10.2 In playing an opponent of equal ability, which is more probable:a) To win 4 games out of 7, or to win 5 games out of 9?b) To win at least 4 games out of 7, or to win at least 5 games out of 9?1-10.3 Prove that n Cr is equal to <n-l)Cr + <n-l)C(r-1) ·1-1 0.4 A football receiver, Harvey Gladiator, is able to catch two-thirds of the passes thrownto him. He must catch three passes for his team to win the game. The quarterbackthrows the ball to Harvey four times.a) Find the probability that Harvey will drop the ball all four times.b) Find the probability that Harvey will win the game.1-10.5 Out of a group of seven EEs and five MEs, a committee consisting of three EEs andtwo MEs is to be formed. In how many ways can this be done if:a) Any EE and any ME can be included?
- 59. 48 C H A PT E R 1 · I NT RO D U CT I O Nb) One particular EE must be on the committee?c) Two particular MEs cannot be on the committee?1-1 0.6 In the digital communication system of Problem 1-7.1, assume that the event of anerroroccurring in one binary symbol is statistically independent ofthe event ofan erroroccurring in any other binary symbol. Finda) The probability of receiving six successive symbols without error.b) The probability of receiving six successive symbols with exactly one error.c) The probability of receiving six successive symbols with more than one error.d) The probability of receiving six successive symbols with one or more errors.1-1 0.7 A multichannel microwave link is to provide telephone communication to a remotecommunity having 12 subscribers, each of whom uses the link 20% ofthe time duringpeak hours. How many channels are needed to make the link available during peakhours to:a) Eighty percent of the subscribers all of the time?b) All of the subscribers 80% of the time?c) All ofthe subscribers 95% ofthe time?1-10.8 A file containing 10,000 characters is to be transferred from one computer to anot:Qer.The probability of any one character being transferred in error is 0.001.a) Find the probability that the file can be transferred without any errors.b) Using the.DeMoivre-Laplace theorem, find the probability thatthere will be exactly10 errors in the transferred file.c) What must the probability of error in transferring one character be in order to makethe probability of transferring the entire file without error as large as 0.99?1-1 0.9 Much of the early interest in probability arose out of a desire to predict the results ofvarious gambling games. One such game is roulette in which a wheel is divided into anumber of separate compartments and a small ball is caused to spin around the wheelwith bets placed as to which compartment it will fall into. A typical roulette wheel has38 compartments numbered 00, 0, 1, 2, . . . , 36. Many ways of betting are possible;however, only one will be considered here; viz., betting thatthe number will be eitherodd or even. The bettor can win only with numbers 1-36, the 0 and 00 are automatic
- 60. PROBLEMS 49house wins. Many schemes have been devised to beat the house. The most commonone is to double your bet when a loss occurs and keep it constant when you win. Totest this system the following MATLAB M-file was written to simulate a series of betsusing this system. (See Appendix G for a discussion of MATLAB.)% P1 1 0_9.mB=1 ; %size of standard betT(1 )=0; %initial total winningsrand(seed,1 000)for m=2:50clear y;clear w; y(1 )=0; w(1 )=B;for k=2:1 0000x=rand;if x <= 1 8/38; %1 8:38 probability of winningy(k)=y(k-1 )+w(k-1 );w(k)=B;else y(k)=y(k-1 )-w(k-1 );w(l<)=2*w(k-1 );endif w(k)>=1 OO*B; breakelseif y(k) >= 1 OO*B; breakendendT(m)=T(m-1 )+y(k);endplot(T); xlabel(Game Number); ylabel(Total Winnings);gridThe program makes a bet and then determines the outcome using a random numbergenerator (rand in the program) that generates numbers distribl!ted randomly between0 and 1. The probability of winning for either odd or even is 18/38, therefore if therandom number is less than 19/36 the bet is won otherwise the bet is lost. If the bet islost the next wager is made twice as large as the last one. The betting sequence endswhen the magnitude of the required bet is 100 times the nominal value or when thewinnings equal or exceed 100 times the nominal wager value. When this occurs thesequence is reinitiated. In the program as written, the sequence is repeated 50 timesand the winnings or losses accumulated.a) Make a plot of the accumulated winnings after 50 repetitions.b) Why does the bettor always lose in the long run?c) Repeat (a) after changing the win probability to 0.5.
- 61. 50 C H A PT E R I · I NT RO D U CT I O NReferencesAll of the following texts provide coverage of the topics discussed in Chapter I. Particularly useful andreadily understandable discussions are contained in Beckmann, Drake, Gnedenko and Khinchin, Lanningand Battin, and Parzen.I . Beckmann, P., Elements ofApplied Probability Theory. New York: Harcourt, Brace and Wi;irld, Inc.,1968.Thisbook provides coverage of much of the material discussed in the first six chapters ofthe presenttext. The mathematical level is essentially the same as the present text but the point of view is oftendifferent, thereby providing useful amplifications or extensions of the concepts being considered. Anumber of interesting examples are worked out in the text.2. Clarke, A. B . , and R. L. Disney, Probability and Random Processes. New York: John Wiley and Sons,Inc., 1 985.This is an undergraduate text intended for students in engineering and science. Its mathematicallevel is somewhat higher than that of the present text and the topical coverage is more restricted. Thecoverage of probability and random processes is thorough and accurate, but there is no discussionof the application ofthese concepts to system analysis. There is an extensive treatment of Markovprocesses and queueing theory, topics not usually found in an undergraduate text.3. Childers, D.G., Probability and Random Processes. Chicago: Irwin, 1 997.A senior or first year graduate level text covering many of the same topics as the present text as wellas a number of additional topics. It assumes a more advanced mathematical background for the studentthan the present text, e.g., linear algebra and matrix mehods are employed in many derivations andapplications. MATLAB is used to illustrate and simulate analytical results and a number of problemsspecifically designed for the use of MATLAB are included4. Davenport, W. B . , Jr., and W. L. Root, Introduction to Random Signals andNoise. New York: McGrawHill, Inc., 1958.This is a graduate level text dealing with the application of probabilistic methods to the analysis ofcommunication systems. The treatment is at an appreciably more advanced mathematical level thanthe present text and will require some diligent effort on the part of an undergraduate wishing to readit. However, the effort will be amply rewarded as this is the classic book in its field andis the mostfrequently quoted reference.5. Drake, A. W., Fundamentals ofApplied Probability Theory. New York: McGraw-Hill, Inc., 1967.This undergraduate text covers the elementary aspects of probability theory in a clear and readablefashion. The material relates directly to Chapters 1 , 2, and 3 of the present text. Of particular interest isthe use of exponential (Fourier) transforms of the probability density functions of continuous randomvariables and Z-transforms of the probability density functions of discrete random variables in placeof the classical characteristic function procedure.6. Gardner, W. A., Introduction to Random Processes, 2nd ed. New York: McGraw-Hill, Inc. 1986.This is a graduate level text on random processes for engineering and science students interestedin analysis and design of signals and systems. The early chapters provide a review of probabilityand random variables. The later chapters provide coverage of a wide range of topics related torandom processes including a number of practical applications of the theory. Although written ata higher mathematical level than the present text, much of the material is readily understandable toundergraduate students with a typical mathematical background.7. Gnedenko, B. Y., and A. Ya. Khinchin, An Elementary Introduction to the Theory ofProbability. NewYork: Dover Publications. Inc., 1 962.
- 62. REFERENCES 5 1This small paperback book was written b y two outstanding Russian mathematicians for use i n highschools. It provides a very clear and easily understood intro.duction to many of the basic concepts ofprobability that are discussed in Chapters 1 and 2 of the present text. The mathematical level is quitelow and, in fact, does not go beyond simple algebra. Nevertheless, the subject matter is of fundamentalimportance and much useful insight into probability theory can be obtained from a study of this book.8. Helstrom, C. W., Probability and Stochastic Processesfor Engineers, 2nd ed. New York: Macmillan,Inc., 199 1 .This is an undergraduate text written expressly for engineering students. Although somewhatmore mathematical than the present text, it is straightforward and easy to read. The book emphasizesprobability, random variables, and random processes, but contains very little on the application ofthese concepts to system analysis. A great many excellent problems are included.·9. Lanning, J. H., Jr., and R. H. Battin, Random Processes inAutomatic Control. New York: McGraw-Hill,Inc., 1 956.This book is a graduate level text in the field of automatic control. However, the first half of thebook provides a particularly clear and understandable treatment of probability and random processesat a level that is readily understandable by juniors or seniors in electrical engineering. A number oftopics, such as random processes, are treated in greater detail than in the present text.·This referencecontains matenal relating to virtually all of the topics covered in the present text although some of theapplications considered in later chapters involve more advanced mathematical concepts.IO. Papoulis, A., Probability, Random Variables, and Stochastic Processes, 3rd ed. New York: McGrawHill, Inc., 1 99 1 .This i s a widely used graduate level text aimed at electrical engineering applications of probabilitytheory. Virtually all of the topics covered in the present text plus a great many more are included. Thetreatment is appreciably more abstract and mathematical than the present text, but a wide range ofuseful examples and results are given. This book provides the most readily available source for manyof these results.1 1 . Parzen, E., Modem Probability Theory and its Applications. New York: John Wiley and Sons, Inc.,1 992.This is a standard undergraduate text on the mathematical theory of probability. The material isclearly presented and many interesting applications of probability are considered in the examples andproblems.1 2. Peebles, P. Z., Probability, Random Variables, and Random Signal Principles, 3rd ed. New York:McGraw-Hill, Inc., 1 993.An undergraduate text that covers essentially the s�e topics as the present text, although at aslightly lower mathematical level. It does contain many excellent problems.13. Spiegel. M. R., Theory and Problems of Probability and Statistics. Schaums Outline Series inMathematics. New York: McGraw-Hill, Inc., 1 975.This is a typical outline that mightbe useful forself-study when used in conjunction with the presenttext. Although short on discussion, it does contain all of the basic definitions and many worked-outexamples. There are also many excellent problems for which answers are provided. This text is oneof the few that contains material on statistics as well as probability.
- 63. CHAPTER 2-----------Random Variables2-t Concept of a Random VariableTheprevious chapterdeals exclusively with situations in which the numberofpossible outcomesassociated with any experiment is finite. Although it is never stated that the outcomes had to befinite in number (because, in fact, they do not), such an assumption is implied and is certainlytrue for such illustrative experiments as tossing coins, throwing dice, and selecting resistors frombins. There are many other experiments, however, in which the number ofpossible outcomes isnot finite, and it is the purpose ofthis chapter to introduce ways ofdescribing such experimentsin accordance with the concepts of probability already established.A good way to introduce this type ofsituation is to consider again the experiment of�electinga resistor from a bin. When mention is made, in the previous chapter, ofselecting a 1-Q resistor,or a 10-Q resistor, or any other value, the implied meaning is that the selected resistor is labeled"1 Q" or "10 Q." The actualvalue ofresistance is expected to be close to the labeled value, butmight differfrom itby some unknown (butmeasurable) amount. The deviations fromthe labeledvalue are dueto manufacturing variations and can assume any value within some specifiedrange.since the actual value ofresistance is unknown in advance, it is a randomv,ariable.To carry this illustratipn further, consider a bin of resistors that are all marked "100 Q."Because ofmanufacturing tolerances, each ofthe resistors in thebinwillhave a slightly differentresistance value. Furthermore, there are an infinite number ofpossible resistance values, so thatthe experiment of selecting one resistor has an infinite number ofpossible outcomes. Even if itis known that all of the resistance values lie between 9.99 Q and 100.01 Q, there are an infinitenumber of such values in this range. Thus, if one defines a particular event as the selection of aresistor with a resistance of exactly 100.00 Q, the probability of this event is actually zero. Onthe other hand, if one were to define an event as the selection of a resistor having a.resistance52
- 64. 2 - 1 C O N C E PT O F A RA N D O M VA RI A B L E 53•between 99.9999 Q and 100.0001 Q, the probability of this event is nonzero. The actual valueof resistance, however, is a random variable that can assume any value in a specified rangeof values.It is also possible to associate random variables with time functions, and, in fact, mostof the applications that are considered in this text are of this type. Although Chapter 3 willdeal exclusively with such random variables and random time functions, it is worth digressingmomentarily, at this point, to note the relationship between the two as it provides an importantphysical motivation for the present study.A typical randomtime function, shown in Figure 2-l , is designated as x (t). In a given physicalsituation, this particular time function is only one of an infinite number of time functions thatmighthave occurred. The collection ofall possible time functions that might have been observedbelongs to arandom process, which will be designated as {i(t)}. When the probability functionsare also specified, this collection is referred to as an ensemble. Any particular member of theensemble, say x (t), is a samplefunction, and the value ofthe sample function at some particulartime, say t1 , is a random variable, which we call X (t1) or simply X1 • Thus, X1 = x (t1) whenx(t) is the particular sample function observed.A random variable associated with a random process is a considerably more involved conceptthan the random variable associated with the resistor above. In the first place, there is a differentrandom variable for each instant of time, although there usually is some relation betweentwo random variables corresponding to two different time instants. In the second place, therandomness we are concerned with is the randomness that exists from sample function to samplefunctionthroughoutthe complete ensemble. There may also be randomness from time instant totime instant, butthis is not an essential ingredient ofarandomprocess. Therefore, the probabilitydescription of the random variables being considered here is also the probability description ofthe random process. However, our initial discussion will concentrate on the random variablesand will be extended later to the random process.From an engineering viewpoint, a random variable is simply a numerical description of theoutcome of a random experiment. Recall that the sample space S = {a} is the set of all possibleoutcomes ofthe experiment. When theoutcome is a, the random variable X has a value that wemight denote as X (a). From this viewpoint, a random variable is simply a real-valued functiondefined over the sample space-and in fact the fundamental definition of a random variableis simply as such a function (with a few restrictions needed for mathematical consistency).For engineering applications, however, it is usually not necessary to consider explicitly thex(t)figure 2-1 A random timefunction.
- 65. 54 C H A PT E R 2 · RA N D O M VA RI A B L ESunderlying sample space. It is generally only necessary to be able to assign probabilities tovarious events associated with the random variables of interest, and these probabilities canoften be inferred directly from the physical situation. What events are required for a completedescription of the random variable, and how the appropriate probabilities can be inferred, formthe subject matter for the rest of this chapter.·If a random variable can assume any value within a specified range (possibly infinite), thenit will be designated as a continuous random variable. In the following discussion all randomvariables will be assumed to be continuous unless stated otherwise. It will be shown, however,that discrete random variables (that is, those assuming one of a countable set of values) can alsobe treated by exactly the same methods.2-2 Distribution FunctionsToconsidercontinuou!>randomvariables withintheframeworkofprobability concepts discussedin the last chapter, it is necessary to define the events to be associated with the probability space.There are many ways in which events might be defined, but the method to be described belowis almost universally accepted.Let X be a random variable as defined above and x be any allowed value of this randomvariable. The probability distributionfunction is defined to be the probability of the event thatthe observed random variable X is less than or equal to the allowed value x. That is,1Fx (x) = Pr (X � x)Since the probability distribution function is a probability, it must satisfy the basic axiomsand must have the same propenies as the probabilities discussed in Chapter 1. However, it isalso a function of x, the possible values of the random variable X, and as such must generallybe defined for all values of x. Thus, the requirement that it be a -probability imposes certainconstraints upon the functional nature of Fx(x). These may be summarized as follows:1. 0 � Fx(X) � 1 - 00 < X < 002. Fx (-oo) = 0 · Fx (oo) = 13. Fx(x) is nondecreasing as x increases.4. Pr (x1 < X � x2) = FxCx2) - Fx(x1)Some possible distribution functions are shown in Figure 2-2. The sketch in (a) indicates acontinuous random variable having possible values ranging from -oo to oo while (b) shows acontinuous random variable for which the possible values lie between aand b. The sketch in (c)shows the probability distribution function for a discrete random variable that can assume onlyfour possible values (that is, 0, a, b, or c). In distribution functions ofthis type it is important to1The subscriptX denotes the random variable while the argumentx could equally well be any other symbol.In much of the subsequent discussion it is convenient to suppress the subscript X when no confusion willresult. Thus Fx (x) will often be written F(x).
- 66. (a)2 - 2 D I ST RI B UT I O N F U N CT I O N Sa 0 b(b)figure 2-2 Some possible probability distribution functions.0.60.40.20 a b c(c)55remember that the definition for Fx(x) includes the condition X = x as well as X < x. Thus,in Figure 2-2(c), it follows (for example)"that Fx (a) = 0.4 and not 0.2.The probability distribution function can also be used to express the probability ofthe eventthatthe observed random variableX is greaterthan (but not equal to) x. Since this event is simplythe complement ofthe event having probability Fx(x) it follows thatPr (X > x) = 1 - Fx(x)As a specific illustration, consider the probability distribution function shown in Figure 2-3.Note that this function satisfies all of the requirements listed above. It is easy to see from thefigure that the following statements (among many other. possible statements) are true:Pr (X S -5) = 0.25Pr (X > -5) = 1 - 0.25 = 0.75Pr (X > 8) = 1 - 0.9 = 0.1Pr (-5 < X S 8) = 0.9 - 0.25 = 0.65Pr (X > 0) = 1 - Pr (X S 0) = 1 - 0.5 = 0.5figure 2-3 A specificprobability distributionfunction.
- 67. 56 CHAPTER 2 · RANDOM VARIABLESFigure 2-4 A probabilitydistribution function withinfinite range.- 20 - 1 0 0 1 0 2 0In the example above, all of the variation of the probability distribution function takes placebetween finite limits. This is not always the case, however. Consider, for example, a proba?ilitydistribution function defined byFx (X) = � (1 + � tan-1 ::) - oo < x < 00 (2-1)2 TC 5and shown in Figure 2-4. Again, there are many different statements that can be made concerning the probability that the random variable X lies in certain regions. For example, it isstraightforward to verify that all of the following are true:Pr (X � -5) = 0.25Pr (X > -5) = 1 - 0.25 = 0.75Pr (X > 8) = 1 - 0.8222 = 0.1778Pr (-5 < x � 8) = 0.8222 - 0.25 = 0.5722Pr (X > 0) = 1 - Pr (X � 0) = 0.5Exercise 2-2.1A random experiment consists of flipping four coins and taking the randomvariable to be the number of heads.a) Sketch the distribution function for this random variable.b) What is the probability thatthe random variable is less than 3.5?
- 68. 2-3 D E N S ITY FU NCTIONSc) What is the probability that the random variable is greater than 2.5?d) What is the probability that the random variable is greater than 0.5 andless than or equal to 3.0?Answers: 1 5/1 6, 7/8, 511 6.Exercise 2-2.2A particular random variable has a probability distribution function given byFindFx (X) = 0 - oo < x :S: 0= 1 - e-2x 0 :S: x < ooa) the probability that X > 0.5b) the probability that X ::::: 0.25c) the probability that 0.3 < X ::::: 0.7.Answers: 0.3022, 0.3935, 0.36792-3 Density Functions57Although the distribution function is a complete description of the probability model for a singlerandom variable, it is not the most convenient form for many calculations of interest. For these,it may be preferable to use the derivative of F(x) rather than F(x) itself. This derivative iscalled the probability densityfunction and, when it exists, it is defined by2fx (x) = limFx (X + e) - Fx(x)=dFx(x)e-+0 e dxThe physical significance of the probability density function is best described in terms of theprobability element, fx (x) dx. This may be interpreted asfx(x) dx = Pr (x < X :S: x + dx) (2-2)2Again, the subscript denotes the random variable and when no confusion results, it may be omitted. Thus,fx(x) will often be written as /(x).
- 69. 58 C H A PT E R 2 · RAN DOM VARIABLESEquation (2-2) simply states that the probability element, fx(x) dx, is the probability pf theevent that the random variable X lies in the range of possible values between x and x + dx.Since fx(x) is a density function and not a probability, it is not necessary that its value beless than l; it may have any nonnegative value.3 Its general propert:ies may be summarized asfollows:l . fx(x) � 0 - oo < x < oo2. L:fx(x) dx == 13. Fx (x) = 1:�fx(u) duAs examples ofprobability density functions, thosecorrespondingtothedistributionfunctionsofFigure 2-2 are shown in Figure 2-5. Note particularly that the density function for a discreterandom variable consists pf a set of delta functions, each having an area equal to the magnitudeofthe corresponding discontinuity in the distribution function. It is also possible to have densityfunctions that contain both a continuous part and one or more delta functions.There are many different mathematical formsthatmightbe probability density functions, butonly a very few of these arise to any significant extent in the analysis of engineering systems.Some of these are considered in subsequent sections and a table containing numerous densityfunctions is given in Appendix B.B�fore considering the more important probability density functions, however, let us look atthe density functions that are associated with the probability distribution functions described inthe previous section. It is clear from Figure 2-3 that the probability density function associatedwith this random variable must be zero for x :::; -10 and x > 10. Furthermore, in the interval. fx.Cxl fx.Cxl0.20.40.2x x x0 a 0 b 0 a b c(a) (b) (c)Figure 2-5 Probability density functions corresponding to the distribution functions of Figure 2-2.3 Because Fx (x) is nondecreasing as x increases.
- 70. 2-3 DENSITY FUNCTIONS 59between - 10 and 10 it must have a constant value since the slope of the distribution function isconstant. Thus:Fx (x) =0=0. 05=0x :::: - 10- lO < x ::S: lOx > 10This is sketched in Figure 2-6.The probability density function corresponding to the distribution function of Figure 2-4 canbe obtained by differentiating the distribution function of (2-1). Thus,dFx (X) d [l 1 1 x ] 5 ( 1 )fx (x) = --.- =- - + -tan- - = - - oo < x < oodx dx 2 rr 5 rr x2 + 25(2-3)This probability density function is displayed in Figure 2-7.A situation that frequently occurs in the analysis of engineering systems is that in which onerandom variable is functionally related to another random variable whose probability densityfunction is known and it is desired to determine the probability density function of the firstrandom variable. For example, it may be desired to find the probability density function of a0 . 5figure 2-6 Probability densityfunction corresponding to thedistribution function of Figure 2-3.----L---------------&.---� x- 1 0 0fic<xl0.0637- 10 01 01 0figure 2-7 Probability density functioncorresponding to the distribution functionof Figure 2-4.
- 71. 60 CHAPTER 2 · RANDOM VARIABLESpower variable when the probability density fun�tion of the corresponding voltage or currentvariable is known. Or it may be desired to find the probability density function after somenonlinear operation is performed on a voltage or current. Although a complete discussion of thisproblem is not necessary here, a few elementary concepts can be presented and will be usefulin subsequent discussions.To formulate the mathematical framework, let the random variable Y be a single-valued,real function of another random variable X. Thus, Y = g(X),4 in which it is assumed that theprobability density function of X is known and is denoted by fx (x), and it is desired to find theprobability density function of Y, which is denoted by fy (y). If it is assumed for the momentthat g(X) is a monotonically increasing function ofX, then the situation shown in Figure 2-8(�)applies. It is clear that whenever the random variable X lies between x and x + dx, the randomvariable Y will lie between y and y + dy. Since the ptobabilities of these events are fx (x) dxand fy(y) dy, one can immediately writefy (y) dy = fx(x) dxfrom which the desired probability density function becomesdxfy (y) = fx (x)dy(2-4)Of course, in the right side of (2-4),x must be replaced by its corresponding function of y.When g(X) is a monotonically decreasing function ofX, as shown in Figure 2-8(b), a similarresult is obtained except that the derivative is negative. Since probability density functions mustbe positive, and also from the geometry of the figure, it is clear that what is needed in (2-4) issimply the absolute value of the derivative. Hence, for either situationfy (y) = fx (x)I�;Iy��......�--�x.___x�+-d-x�� x(a)Rgure 2-8 Transformation of variables.y(b)4Tuis also implies that the possible values of x ·and Y are related by y = g(x).(2-5)
- 72. 2-3 D E N S ITY F U N CT I O N S 61To illustrate the transfonnation ofvariables, consider first the problem ofscaling the amplitudeof a random variable. Assume that we have a random variable X whose probability densityfunction fx(x) is known. We then consider another random variable Y that is linearly relatedto Xby Y = AX. This situation arises, for example, when X is the input to an amplifier ah.d Yis its output. Since the possible values of Xand Y�e related in the same way, it follows thatdy- = AdxFrom (2-5) it is clear that the probability density function of Yisfr(y) = l�Ifx (�)Thus, itis very easy to find the probability density ofany random variable that is simply a scaledversion of another random variable whose density function is known.Consider next a specific example ofthe transfonnation of random variables by assuming thatthe.random variable Xhas a density function of the forinwhere u(x) is the unit step starting at x = 0. Now consider another random variable Ythat isrelated to XbyY = X3Since y and xare related in the same way, it follows thatdy = 3x2dxanddx = =dy 3x2 3y213Thus, the probability density function of Yis...:y l /3fy(y) = _e -y-2f3u(y)3There may also be situations ill which, for a given Y,g(X)has regions in which the derivativeis positive and other regions in which it is negative. In such cases, the regions may be consideredseparately and the corresponding probability densities added. An example ofthis sort will serveto illustrate such a transfonnation.
- 73. 62 C H A PT E R 2 • RA N D O M VA RI A B L ESfigure 2-9 The square lawtransformation.Let the functional relationship beY = X2This is shown in Figure 2-9 and represents, for example, the transformation (except for a scalefactor) of a voltage random variable into a power random variable. Since the derivative, dx/dy,has an absolute value given byand since there aretwo x-values for every y-value (x = ±JY), the desired probability densityfunction is simplylfy (y) = 2./Y [fx(.Jy) + fx(-Jy)] Y � 0Furthermore, since y can never be negative,fy (y) = 0 y < 0Some other applications of random variable transformations are considered later.Exercise 2-3.1The probability df:!nsity function of a random variable has the form fx(x) =5e-Kxu(x), where u(x) is the unit step. function. Finda) the value of Kb) the probability that X > 1c) the probability that X � 0.5.Answers: 0.0067, 5, 0.91 79(2-6)
- 74. 2 - 4 M E A N VA L U E S A N D M O M E NTS 63Exercise 2-3.2A random variable Y is related to the random variable X of Exercise 2-3 . 1 byY = 5X + 3Find the probability density function of Y.Answer: e3-Yu (y - 3)2-4 Mean Values and MomentsOne ofthe most important and most fundamental concepts associated with statistical methods isthat offinding average values ofrandom variables orfunctions ofrandom variables. The conceptof finding average values for time fundions by integrating over some time interval, and thendividing by the length ofthe interval, is a familiarone to electrical engineers, since operations ofthis sort are used to find the de component, the root-mean-square value, or the average power ofthe time function. Such time averages may also be important forrandom functions of time, but,of course, have no meaning when considering a single random variable, which is defined as.thevalue of the time function at a single instant of time. Instead, it is necessary to find the averagevalue by integrating over the range of possible values that the random variable may assume.Such an operation is referred to as "ensemble averaging," and the result is the mean value.Several different notations are in standard use for the mean value, but the most common onesin engineering literature aresX = E[X] = 1_:xf(x)dx (2-7)The symbol E[X] is usually read "the expected value of X" or "the mathematical expectationof X." It is shown later that in many cases of practical interest, the mean value of a randomvariable is equal to the time average of any sample function from the random process to whichthe random variable belongs. In such cases, finding the mean value of a random voltage orcurrent is equivalent to finding its de component; this interpretation will be employed here forillustration.The expected value o� any function of x can also be obtained by a similar calculation. Thus,E[g(X)] = 1_:g(x)f(x)dx ·. (2-8)5 Note that the subscript X has been omitted from f(x) since there is no doubt as to what the randomvariable is.
- 75. 64 C H A PTER 2 • RA N DOM VA RI A B L ESA ftinction of particular importance is g(x) = xn, since this leads to the general moments of therandom variable. Thus,xn = E[xn-] = L:xn f(x) dx (2-9)By far the most important moments of X are those given by n = l, which is the mean valuediscussed above, and by n = 2, which leads to the me�-square value.X2 = E[X2] = L:x2f(x) dx (2-10)The importance of the mean-square value lies in the fact that it may often be interpreted asbeing equal to the time average of the square of a random voltage or current. In such cases,the mean-square value is proportional to the average power (in a resistor) and its square root isequal to the rms or effective value of the random voltage or current.It is also possible to define central moments, which are simply the moments of the differencebetween a random variable and its mean value. Thus the nth central moment is(X - X)n = E [(X - X)n] = L:(x - X)nf(x) dx (2-1 1)The central moment for n = 1 is, of course, zero, while the central moment for n = 2 is soimportant that it carries a special name, the variance, and is usually symbolized by u2• Thus,u2 = (X - X)2 = L:(x_- X)2f(x) dx (2-12)The variance can also be expressed in an alternative form by using the rules for the expectationsof sums; that is,Thus,E[X1 + X2 + · · · + Xm] = E[X1l + E[X2] + · · · + E[Xm]u2 = E[(X - X)2] = E[X2 - 2XX + (X)2]= E[X2] - 2E[X]X + (X)2= X2 - 2X x + (X)2 = x2 - (X)2(2-13)and it is seen that the variance is the difference between the mean-square value and the squareof the mean value. The square root of the variance, u , is known as the standard deviation.In electrical circuits, the variance can often be related to the average power (in a resistance)ofthe ac components of a voltage or current. The square root of the variance would be the value
- 76. 2 - 4 MEAN VALUES AND MOMENTS 65j.ndicated by an ac voltmeter or ammeter of the rms type that does not respond to direct current(because of capacitive coupling, for example).To illustrate some ofthe above ideas concerning mean values and moments, considerarandomvariable having a uniform probability density function as shown in Figure 2-10. A voltagewaveform that would lead to such a probability density function might be a sawtooth waveformthat varied linearly between 20and 40 V. The appropriate mathematical represertlation for thisdensity function isf(x) = 01= 20= 0=-oo < x :::: 2020 < x :S 4040 < X < OOThe mean value of this rando!D variable is obtained by using (2-7). Thus,140 ( 1 ) 1 x2 14o 1X = x - dx = - · - = -(1600 - 400) = 3020 20 20 2 20 40This value is intuitively the average value of the sawtooth wavefonnjust described. The meansquare value is obtained from (2-10) as2 140 ( 1 ) 1 x3 14o 1X = x2 - dx = - - = -(64 - 8)103 = 933.320 20 20 3 20 60The varianceoftherandomvariablecanbeobtainedfromeither(Z-12)or (2-13).Fromthe latter,a2 = X2 - (X)2 = 933.3 - (30)2 = 33.3On the basis of the assumptions that will be made concerning_ random processes, if thesawtooth voltage were measured with a de voltmeter, the reading would be 30 V. If it weremeasured with an rms-reading ac voltmeter (which did not respond to de), the reading wouldbe .J333 v.As a second illustration of the determination of the moments of a random variable, considerthe probability density function1.20f(x) = kx[u(x) � u(x - 1)]f (x)---- - - -- - - - --------..____._1___.___......1___.___......1___ x0 10 20 30 40 50figure 2-10 A uniform probabilitydensity function.
- 77. 66 CHAPTER 2 • RANDOM VARIABLESThe value ofk can be determined from the 0th moment of f(x) since that isjust the area of thedensity function and must be 1. Thus,11 kkx dx = - = 10 2:. k = 2The mean and mean-square value ofXmay now be calculated readily asX = 11x(2x) dx = 2/32 r 1X = Jo x2(2x) dx = 1/2From these two quantities the variance becomes2 -2 - 2 1 2 1( )2a . = X - (X) = 2 - 3 = lSLikewise, the 4th moment ofX isX = x4(2x) dx = -=411 10 3and the 4th central moment is given by11 ( 2)4 1(X -: X)4 = x - - (2x) dx = -0 3 135This latter integration is facilitated by observing thatExercise 2-4.1For the random variable of Exercise 2-3.1 , finda) the mean value of Xb) the mean-square value of Xc) the variance of X.Answers: 2/25, 1 /5, 1/5
- 78. 2-5 TH E GAUSSIAN RAN DOM VARIABLEExercise 2-4.2A random variable X has a probability density function of the formIfx(x) =4 [u (x) - u (x - 4)]For the random variable Y = X2, finda) the mean valueb) the mean-square valuec) the variance.Answers: 1 6/3, 256/5, 1 024/452-5 The Gaussian Random Variable67Of the various density functions that we shall study, the most important by far is the Gaussianor normal density function. There are many reasons for its importance, some of which .areas follows:1. It provides a good mathematical model for a great many different physically observedrandom phenomena. Furthermore, the fact that it should be a good model can bejustifiedtheoretically in many cases.2. It is one of the few density functions that can be extended to handle an arbitrarily largenumber of random variables conveniently.3. Linear combinations of Gaussian random variables lead to new random variables that arealso Gaussian. This is not true for most other density functions.4. Therandom processfromwhichGaussian random variables are derived can be completelyspecified, in a statistical sense, from a knowledge of all first and second moments only.This is not true for other processes.5. In system analysis, the Gaussian process is often the only one for which a completestatistical analysis can be carried through in either the linear or the nonlinear situation.The mathematical representation ofthe Gaussian density function is1 -(x - X)[ -2 ]f(x) =..fiiiaexp2a2- OO < X < OO (2-14)where X and a2 are themean and variance, respectively. The corresponding distribution functioncannot be written in closed form. The shapes of the density function and distribution function
- 79. 68 CHAPTER 2 • RANDOM VARIABLESl.,f2; a0.607,./& af(x)---�----------��� x0 X - u X X + u(a)10.8410.5F (x)��-===::;..�-----�.1...-���xQ X - u X X + u(b)Figure 2-1 1 The Gaussian random variable: (a) density function and (b) distribution function.are shown in Figure 2-11.There are a number ofpoints in connection withthesecurvesthatareworth noting:1. There is only one maximum and it occurs at the mean value.2. The density function i� symmetrical about the mean value.3. The width ofthedensity function is directly proportional to the standarddeviation,a.Thewidth of 2a occurs at the points where the height is 0.607 of the maximum value. Theseare also the points of maximum absolute slope.4. The maximum value of the density function is inversely proportional to the standarddeviation a.Since the density function has an areaofunity, itcanbeused as arepresentationof the impulse or delta function by letting a approach zero. That is[ - 2 ]- 1 - (x - X)8(x - "JO>= lim r,c exp 2a-->0 / 2rr(J 2a (2-15)This representation of the delta function has an advantage over some others of beinginfinitely differentiable.The Gaussian distribution function cannotbe expressed in closed form in terms ofelementaryfunctions. It can, however, be expressed in terms offunctions thatarecommonlytabulated. Fromthe relation between density and distribution functions it follows that the general Gaussiandistribution function isfx1fx [ (u - X)2 ]F(x) = f(u)du = r,c exp - 2 du-oo v 2rra -oo 2a (2-16)The function that is usually tabulated is the distribution function for a Gaussian random variablethat has a mean value ofzero and a variance of unity (that is, X = 0, a = 1). This distributionfunction is often designated by <I> (x) and is defined by
- 80. 2-5 THE GAUSSIAN RANDOM VARIABLE1 lx ( �2 )<l>(x)=-- exp -- du,J2ii -oo 269(2-17)By means of a simplechange of variable it is easy to show thatthe general Gaussian distributionfunction of (2-14)can be expressed in terms of <l>(x)by(x-X)F(x)=<I> -a_- (2-18)An abbreviated table of values for <l>(x)is given in Appendix D. Since only positive values ofx are tabulated, it is frequently necessary to use the additional relationship<l>(-x)=1-<l>(x) (2-19)Another function that is closely related to <l>(x),and is often more convenient to use, is theQ-function defined byand for·which1 100 ( u2 )Q (x) =-- exp -- du,J2ii x 2Q(-x)=1-Q(x)Upon comparing this with (2-17),it is clear thatQ(x)=1-<l>(x)Likewise. comparing with (2-18)(x-X)F(x)=1-Q -a -A brief table of values for Q (x)is given in Appendix E for small values of x.(2-20)(2-21)Several alternative notations are encounteredinthe literature. Inparticular in the mathematicalliterature and in mathematical tables, a quantity defined as the error function is commonlyencountered. This function is defined as2 r 2erf(x)= ./iiJo e-u duThe Q-function is related to the error function by the following equation:(2-22)(2-23)
- 81. 70 CHAPTER 2 · RANDOM VARIABLESThe errorfunctionis abuilt-in function ofthe MATLAB application and can be used to calculatethe Q-function. Often of equal importance are the inverses of these functions that are neededto find the parameters that lead to observed or specified probabilities of events. Appendix Gdiswsses computation of the Q-function and the Qinv-function using MATLAB.The Q-function is bounded by two readily calculated analytical expressions as follows:(1 - _!_)_l_e-a2;z :S Q(a) :S _l_e-02;2a2 a,J2ir a,J2ir(2-24)Figure 2-12 shows the Q-function and the two bounds. It is seen that when the argument isgreater than about 3 the bounds closely approximate. the Q-function. The reason the boundsare important is that it allows closed form analytica� solutions to be obtained in many casesthat would otherwise allow only graphical or numerical results. By averaging the two boundsan approximation to the Q-function can be obtained that is closer than either bound. Thisapproximation is given by(a) = 1 - - --. e-Q � ( 1 ) 1 02122a2 affeA plot of this function along with the Q-function is shown in Figure 2-13.1 0-2�§ 1 0-40CD-g� 1 0-e013c:i! 1 0-S010-101 0-120::::::::::�..._2Figure 2-12 Bounds of the Q-function."�"�3 4Argument� -"""I. ,5 6(2-25)--"-7
- 82. c0ii1 0-2·� 1 o-4e0.0.� 1 Q-6cCISc0ti 1 0-aci!0 1 0-101 0-1202-5 TH E GAUSSIAN RAN DOM VARIABLE(�---I""""""""2 3 4 5Argumentflgure. 2-1 3 Q-function and its approximation.716 7The Q-function is useful in calculating the probability of events that occur very rarely. Anexample will serve to illustrate this application. Suppose we have an IC trigger circuit that ·issupposed to change state whenever the input voltage exceeds 2.5 V; that is, whenever the inputgoes from a "O state to a "1" state. Assume that when the input is in the "O" state the voltage isactually 0.5 V,butthat there is Gaussian random noise superimposed on this having a variance of0.2V squared. Thus, the inputto the trigger circuit canbe modeled as a Gaussianrandomvariablewith a mean of 0.5 and a variance of 0.2. We wish to determine the probability that the circuitwill incorrectly trigger as a result oftherandominput exceeding 2.5. From the definition oftheQ-function, it follows that the desired probability is just Q[(2.5 - 0.5)/Jo.2] = Q(4.472).The value of Q(4.472) can be found by interpolation from the table in Appendix E or usingMATLAB and has a value of 3.875 x 10-6.It is seen that the probability of incorrectly triggering on any one operation is quite small.However, over a period of time in which many operations occur, the probability can becomesignificant. The probability that false triggering does notoccur is simply 1 minus the probabilitythat it does occur. Thus, in noperations, the probability that false triggering occurs isPr (False Triggering) = 1 -(1 - 3.875 x 10-6tFor n = 105, this probability becomesPr (False Triggering) = 0.321
- 83. 72 CHAPTER 2 • RANDOM VARIABLESSuppose that it is desired to find the variance of the noise that would lead to a specifiedprobability of false triggering, e.g., a false trigger probability of 0.01 in 106 triggers. This isessentially the opposite of the situation just considered, and is solved by working the problemin reverse. ThusPr (False Triggering) = 0.01 = 1 - (1 -p)106Solving for p gives p = 1 .0050 x 1o-8• The value for a is then found fromQ(25 �0·5 ) = 1.0050 x 10-82 2(1 = = -- = 0.3564Q-1(1.0050 X 10-8) . 5.611 1a2 = 0.127(2-26)Q-1 (1.0050 x 10-8) is found using the MATLAB function Qinv given in Appendix G andis Qinv(l .0050 x 10-8) = 5.611. A conclusion that can be drawn from this example is thatwhen there is appreciable noise in a digital circuit, errors are almost certain to occur sooneror later.Although many of the most useful properties of Gaussian random variables will becomeapparent only when two or more variables are considered, one that can be mentioned now isthe ease with which high-order central moments can be determined. The nth central moment,which was defined in (2-11), can be expressed for a Gaussian random variable as(X - X)n = 0 n odd= 1 · 3 · 5 · · · (n - l)an n even(2-27)As an example ofthe use of (2-27}.,ifn = 4, the fourth central moment is (X - X)4 = 3a4• Aword of caution should be noted, however. The relation between the nth general moment, xn,and the nth central moment is not always as simple as it is for n = 2. In the n = 4 Gaussiancase, for example,Before leaving the subject of Gaussian density functions, it is interesting to compare thedefining equation, (2-14), with the probability associated with Bernoulli trials for the case oflargen as approximatedin (1-30). Itwillbenotedthat, exceptforthefactthatk andn are integers,the DeMoivre-Laplace approximation has the same form as a Gaussian density function with amean value ofnp and a variance ofnpq. Since t;he Bemouili probabilities are discrete, the exactdensity function for this case is a set of delta functions that increases in number as n increases,and as n becomes large the area of these delta functions follows a Gaussian·law.
- 84. 2-5 THE GAUSSIAN RANDOM VARIABLE 73Another important result closely related to this is the central limit theorem. This famoustheorem concerns the sumof a large number of independent random variables having the sameprobability density function. In particular, lettherandomvariables be X1 , X2, . . . Xn and assumethatthey all have the same mean value, m,and the same variance, a2• Then define a normalizedsum as(2-28)Underconditionsthatareweak enough to berealizedby almost anyrandomvariable encounteredin real life, the central limit theorem states that the probability d�nsity function for Y approachesa Gaussian density function as n becomes large regardless of the density function for the Xs.Furthermore, because of the normalization, the randorri variable Y will have zero mean anda variance of a2. The theorem is also true for more general conditions, but this is not theimportant aspect here. What is important is to recognize that a great many random phenomenathat arise in physical situations result from the combined actions of many individual events.This is true for such things as thermal agitation of electrons in a conductor, shot noise fromelectrons or holes in a vacuum tube or transistor, atmospheric noise, turbulence in a medium,ocean wav�s, and many other physical sources of random disturbances. Hence, regardless ofthe probability density functions of the individual components (and these density functionsare usually not even known), one would expect to find that �e observed disturbance has aGaussian density function. The central limit theorem provides a theoretical justification forassuming this, and, in almost all cases, experimental measurements bear out the soundness ofthis assumption.In dealing with numerical values of the occurrences of random events one of the toolsfrequently used is the histogram. A histogram is generated from a set of random variablesby sorting the data into a set of bins or equal sized intervals of the variables range of values.The number of occurrences of the variable in each bin is counted and the result is plotted as abar chart. An example will illustrate the procedure. Table 2...:1 is a set ofrandom variables drawnfrom a population having a Gaussian distribution. It is seen that the values extend from -21 1to +276 for a total range of 487. Dividing this into 10 ihtervals each of length 42 and countingthe number of values in each interval leads to the values shown in Table 2-2. When these dataare plotted as a bar graph it becomes the histogram shown in Figure 2-14. If the number ofoccurrences in a bin is divided by the total number of occurrences times the width ofthe bin,Table 2-1 Random Variable Values32 -7 -54 5 21-25 153 -124 276 601 59 67 -20 -30 7236 -21 1 58 - 1 03 27-4 -44 23 -74 57
- 85. 74 CHAPTER 2 · RANDOM VARIABLESan approximation to the probability density function is obtained. As more samples are used, theapproximation gets better.MATLAB provides a simple procedure for obtaining the histogram as well as obtaining samples ofrandom variables. Ifthe data are in a vectorx then the histogram can be obtained with thecommand hist(x). Theresultofusing this command with the data ofTable 2-1 would be the graphshown in Figure 2-14. However, ifmore bins are desired the command hist(x,n) can be used andTable 2-2 Data for HistogramBin Intervals Number of Occurrences187 1-138 1-89 2-41 58 757 6106 0154 2203 0252 16Values of Random Variablefigure Z.:.14 Histogram of data in Table 2-1 .
- 86. Z - 5 TH E GAUSSIAN RANDOM VARIABLE 75thedata set will be divided into n bins. The command [m,v] = hist(x,n) leads to an.n x 2matrixin which the first column contains the frequency counts and the second column contains the binlocations. To illustrate one way these commands can be used consider the following MATLABprogram thf[t generates data, then computes the histogram of the data. The data are generatedusing the command x = randn(l,1000), which produces a vector of 1000 values having aGaussian or "normal" probability distribution with zero mean and unit standard deviation. Inthe program the standard deviation is changed to 2by multiplying the data by 2and the mean ischanged to 5by adding 5 to each sample value. The resulting data set is shown in Figure 2-15.After the pause the program computes the data for the histogram, then divides the counts bythe total number of samples times the bin width and plots the result as a bar chart similar to thehistogram but whose values approximate those of the probability density function. The actualprobability density function is superimposed on the bar chart. The result is shown in Figure 2-16.Itisseenthatthehistogramclosely follows the shape ofthe Gaussianprobability density function.%gaushist.m hist of Gaussian rvn=1 000;x=2*randn(1 ,n)+5*ones(1 ,n); %generate vector of samplesplot(x)xlabel(lndex); ylabel(Amplitude); gridpaus�[m,z]=hist(x);.g:I:t::1S.cl1 21 0 �-M ;M___8642%calculate counts in bins and bin coordinatesI··----t---------1-----------! I; Io --·-··- ·-+-·-·-·---L----·-·-+IndexRgure 2-1 5 Orte thousand samples of a Gaussian random variable.
- 87. 76 CHAPTER 2 • RANDOM VARIABLES0.20.1 80.1 6� 0.14"iiii 0.120� 0.1:c� 0.08n: 0.060.040.020-2 0 2 4 6Random Variable Valuefigure 2-16 Normalized histogram of data of Figure 2-15.w = max(z)/1 O; %calculate bin width8mm=m/(1 OOO*w); %find probability in each bin1 0 1 2v=linspace(min(x),max(x)); %generate 1 00 values over range of rv xy::(1/(2*sqrt(2*pi)))*exp(-((v-5*ones(size(v))).A2)/8); %Gaussian pdfbar(z,mm) %plot histogramhold on %retain histogram plotplot(v,y) %superimpose plot of Gaussian pdfxlabel(Random Variable Value);ylabel(Probability Density)hold off %release hold of figureExercise 2""".5.1A Gaussian random variable has a mean value of 1 and a variance of 4.Finda) the probability that the random variable has a negative valueb) the probability that the random variable has a value between 1 and 2
- 88. 2-6 DENSITY FUNCTIONS RELATED TO GAUSSIANc) the probability that the random variable is greater than 4.Answers: 0.3085, 0.1 91 5, 0.0668Exercise 2-5.2For the random variable of Exercise 2-5.1 , finda) the fourth central momentb) the fourth momentc) the third central momentd) the third moment.Answers: 0, 1 3, 48, 732-6 Density · functions Related to Gaussian77The previous section has indicated some of the reasons for the tremendous importance of theGaussian density function. Still another reason is that there are many other probability densityfunctions, which arise in practical applications, that are related to the Gaussian density functionand can be derived from it. The purpose of this section is to list some of these other densityfunctions and indicate the situations under which they arise. They will not all be derived here,since in most cases insufficient background is available, but several of the more important oneswill be derived as illustrations ofparticular techniques.Distribution of PowerWhen the voltage or current in a circuit is the random variable, the power dissipated in aresistor is also a random variable that is proportional to the square ofthe voltage or current. Thetransformation that applies in this case is discussed in Section 2-3 and is used here to determinethe probability density function associated with the power of a Gaussian· voltage or current.In particular, let I be the fandom variable / (t1) and assume that fI(i) is Gaussian. The powerrandom variable, W, is then given byW = R/2and it is desired to find its probability density function fw(w) By analogy to the result in (2-6),this probability density function may be written as
- 89. 78 CHAPTER 2 • RANDOM VARIABLES(2-29)= 0 w < 0If I is Gaussian and assumed to have zero mean, thenwhere a} is the variance of I. Hence, a1 has the physical significance of being the rms value ofthe current. Furthermore, since the density function is symmetrical, fI (i) = f; (-i). Thus, thetwo terms of (2-29) are identical and the probability density function of the power becomes1 (. w )fw(w) = exp ---a1../2rrRw 2RaJ= 0(2-30)w < 0This density function is sketched in Figure 2-17. Straightforward calculation indicates that themean value of the power isand the variance of the power is-2 2W = E[RI ] = Ra1ai = W2 - (W)2 = E[R2J4] -(W)2= 3R2(aj - (Raj)2 = 2R2ajIt may be noted that the probability density function for the power is infinite at w = O; thatis, the most probable value of power is zero. This is a consequence of the fact that the mostprobable value of current is also zero and that the derivative of the transformation (dW/di) iszero here. It is important to note, however, that there is not a delta function in the probabilitydensity function.•Figure 2-17 Density function for the powerof a Gaussian current.
- 90. 2-6 DENSITY FUNCTIONS RELATED TO GAUSSIAN 79The probability distribution function for the power can be obtained, in principle, by integratingthe probability density function for the power. However, this integration does not result in aclosed-fonn result. Nevertheless, it is possible to obtain the desired probability distributionfunction quite readily by employing the basic definition. Specifically, the_ probability that thepower is less than or equal to some value w is just the same as the probability that the c.•rrentis between the values of +,JliilR and -,JliilR. Thus, since I is assumed to.be Gaussian withzero mean and variance a}, the probability distribution function for the power becomesFw(w) = Pr [i � vw/R] - Pr [i � -vw/R] = ct> (�)- ct> (-�)= 2ct> (�) - 1 w ::::: o= 0 w < 0In tenns of the Q-function this becomes(,JliilR)fw(w) = 1 - 2Q --.a1= 0 w < 0As an illustration of the use of the power distribution function consider the power deliveredto a loudspeaker in a typical stereo system. Assume that the speaker has a resistance of 4 Qand is rated for a maximum power of 25W. If the current driving the speaker is assumed to beGaussian and at a level that provides an average power of 4W, what is the probability that themaximum power level of the speaker will be exceeded? Since 4 W dissipated in 4 Q implies avalue of a}= 1, it follows thatPr (W > 25) = 1 - Fw(25) = 2Q (�)= 2(0.0061) = 0.0124This probability implies that the maximum speaker power is exceeded several times per secondfor a Gaussian signal. The situation is probably worse than this in an actual case because theprobability density function of music is not Gaussian, but tends to have peak values that aremore probable than that ptedicted by the Gaussian assumption.Exercise 2-6.1A Gaussian random voltage having a mean value of zero and a standarddeviation ·of 4 V is applied to a resistance of 2 n. Find
- 91. 80 CHAPTER 2 • RANDOM VARIABLESa) the approximate probability that the power dissipated in the resistanceis between 9.9 W and 1 0.1 W (use the power density function)b) the probability that the power dissipated in the resistor is greater than25 Wc) the probability that the power dissipated in the resistor is less than orequal to 1 O W.Answers: 0.0048, 0.7364, 0.0771Rayleigh DistributionThe Rayleigh probability density function arises in several different physical situations. Forexample, it will be shown later that the peak values (that is, the envelope) ofa random voltage 0rcurrent having a Gaussian probability density function will follow the Rayleigh density function.The original derivation of this density function (by Lord Rayleigh in 1880) was applied to theenvelope ofthe sum ofmany sine waves ofdifferent frequencies. It also arises in connection withthe errors associated with the aiming of firearms, missiles, and other projectiles, if the errors ineach ofthe two rectangular coordinates have independent Gaussian probability densities. Thus,if the origin of a rectangular coordinate system is taken to be the target and the error along oneaxis is X and the error along the other axis is Y, the total miss distance is simplyWhen X and Y are independent Gaussian random variables with zero mean and equal variances,cr2,the probability density function for R isr ( r2 )fR(r) = - exp - -cr2 2cr2= 0(2-31)r < OThis is the Rayleigh probability density function and is sketched in Figure 2-18for two differentvalues of cr2. Note that the maximum value of the density function is at er , but that the densityfunction is not symmetrical about this maximum point.The mean value of the Rayleigh-distributed random variable is easily computed from100 loo r2 ( r2 )R = rfR(r)dr = 2 exp --2 dro o � 2cr= 20and the mean-square value from
- 92. 2-6 DENSITY FUNCTIONS RELATED TO GAUSSIANFigure 2-1 8 The Rayleigh probabilitydensity function.R2 = {oor2fR(r)dr = {oo r: exp (- r22) drlo lo <T 2<T= 2<12The variance of R is therefore given bya; = R.2 - (R)2 = (2 - �)<T2 = 0.429a281Note that this variance is not the same as the variance <T2 ofthe Gaussian random variablesthat generate the Rayleigh random variable. It may also be noted that, unlike the Gaussiandensity function, both the mean and variance depend upon a single parameter (a2) and cannotbe adjusted independently.It is straightforward to find the probability distribution function for the Rayleigh randomvariable because the density function can be integrated readily. Thus,1r u ( u2) ( r2)FR(r) = 2 exp -2du = I - exp -20 <1 2<1 2<1= 0r 2:: 0(2-32)r < OAs an example of the Rayleigh density function, consider an aiming problem in which anarcher shoots at a target two feet in diameter and for which the bullseye is centered on the origin ofan XY coordinate system. The position at which any arrow strikes the targetis a random variablehaving an X-component and a Y-component. It is determined that the standard deviation ofthesecomponents is 114 foot; that is, <Tx = <Ty = 1/4. On the assumption that the X and Y componentsof the hit position are independent Gaussian random variables, the distance from the hit positionto the center of the target (i.e., the missdistance) is a Rayleigh distributed random variable forwhich the probability density function isfR(r) = 16r exp(-8r2) r :=::: 0
- 93. 82 CHAPTER 2 · RANDOM VARIABLESUsing the results obtained above, the mean value ofthe miss distance becomes R = .JiC/2(1 /4)= 0.313feet and its standard deviation is aR = J0.429(1/4) = 0.164feet. Fromthedistributionfunction the probability that the target will be missed completely isPr (Miss) = 1 - FR (l ) = 1 - [1 - exp (-2(0��5)2 )r= e-8 = 3.35 x 10-4Similarly, if the bulls-eye is two inches in diameter, the probability ofmaking a bulls-eye isPr (Bulls-eve) = FR (_!_) = 1 - exp (-�) = 0.0540, 12 144Obviously, this example describes an archer who is not very skillful, in spite ofthe fact that herarely misses the entire target!Exercise 2-6.2An amateur marksman shooting at a target 1 0 inches in diameter has anaverage miss distance from the center of 2 inches. What is the probabilitythat he will miss the target completely?Answer: 0.0074Maxwell DistributionA classical problem in thermodynamics is that of determining the probability density functionof the velocity of a molecule in a perfect gas. The basic assumption is that each component ofvelocity is Gaussianwithzeromean and avarianceofa2 = kT/m, wherek = 1 .38 x 1023 Ws/Kis Boltzmanns constant, T is the absolute temperature in kelvin, m is the mass of the moleculein kilograms and K is the Kelvin unit of temperature. The total velocity is, therefore,v = Jv2 + v2 + v2x } zand is said to have a Maxwell distribution. The resulting probability density function can beshown to be
- 94. 2-6 DENSITY FUNCTIONS RELATED TO GAUSSIAN 83v ?::: 0(2-33)v < OThe mean value of a Maxwellian-distributed random variable (the average molecule velocity)can be found in the usual way and isThe mean-square value and variance can be shown to beV2 = 3u2ui = y2 - (V)2 = (3 - �)u2= 0.453u2The mean kinetic energy can be obtained from V2 since1e = -mV22and1 - 3 2 3 (kT) 3E[e] =2mv2 =2mu =2m -;;;- = 2.kTwhich is the classical result.The probability distribution function for the Maxwell density cannot be expressed readilyin terms of elementary functions, or even in terms of tabulated functions. Thus, in most casesinvolving this distribution function, it is necessary to carry out the integration numerically. Asan illustration of the Maxwell distribution, suppose we attempt to determine the probability thata given gas molecule will have a kinetic energy that is more than twice the mean value of kineticenergy for all th� molecules. Since the kinetic energy is given by1·e = -mV22and the mean kinetic energy isjust (3/2)mu2,the velocity of a molecule having more than twicethe mean kinetic energy isV > ...f6u
- 95. 84 CHAPTER 2 · RANDOM VARIABLESThe probability that a molecule will have a velocity in this range isPr (v > ./6a) = JOQ fIv: exp (--;) dv.J6a V ; a 2aThis can be integrated numerically to yjeldPr (e > 2e) = Pr (V > ./6a) = 0. 1 1 16Exercise 2-6.3In a certain gas at-400 K, it is found that the number of molecules havingvelocities in the vicinity of 1 x 1 03 meters/second is twice as great asthe number of molecules having velocities in the vicinity of 4 x 1 03 meters/second. Finda) the mean velocity of the moleculesb) the mass of the molecules.Answers: 2.53 x 1 0-27, 2347Chi-Square DistributionA generalization of the above results arises if one defines a random variable as(2-34)where Yi , Yz, ..., Yn are independent Gaussian random variables with 0 mean and variance 1 .The random variable X2is said to have a Chi-squaredistributionwithndegreesoffreedomandthe probability density function is2 (x2)nf2-I( x2)f(x ) = 2n12r(n/2) exp- 2x2 ::: 0(2-35)= 0 x2 < 0With suitable normalization of random variables (so as to obtain unit variance), the powerdistribution discussed above is seen to be chi-square with n = 1 . Likewise, in the Rayleigh
- 96. 2-6 DENSITY FU NCTIONS RELATED TO GAUSSIAN 85distribution, the square of the miss-distance (R2) is chi-square with n = 2; and in the Maxwelldistribution, the square of the velocity (V2) is chi-square with n = 3. This latter case wouldlead to the probability density function of molecule energies.The mean and variance of a chi-square random variable are particularly simple because ofthe initial assumption ofunit variance for the components. Thus,X2 = n(ax2 )2 = 2nThechi-square distribution arises inmany signal detectionproblems in which one is samplingan observed voltage and attempting to decide ifit isjust noise or ifit contains a signal also. Iftheobserved voltage is just noise, then the samples have zero mean and the chi-square distributiondescribed above applies. If, however, there is also a signal in the observed voltage, the meanvalue of the samples is not zero. The random variable that results from summing the squaresof the samples as in (2-34) now has a noncentral chi-square distribution. Although detectionproblems ofthe sortdescribedhereareextremelyimportant,furtherdiscussion ofthis applicationof the chi-square distribution is beyond the scope of this book.Exercise 2-6.4Twelve independent samples of a Gaussian voltage are taken and each·sample is found to have zero mean and a variance of 9. A new randomvariable is constructed by summing the squares of these samples. Finda) the meanb) the variance of this new random variable.Answers: 1 944, 1 08Log-Normal DistributionA somewhat different relationship to the Gaussian distribution arises in the case of randomvariables that are defined as the logarithms of other random variables. For example, in communication systems the attenuation of the signal power in the transmission path is frequentlyexpressed in units of nepers, and is calculated from(Wout)A = In -.- nepersWm
- 97. 86 CHAPTER 2 • RANDOM VARIABLESwhere Win and Wout are the input and output signal powers, respectively. An expefi.mentallyobserved fact is that the attenuation A is very often quite close to being a Gaussian randomvariable. The question that arises, therefore, concerns the probability density function of thepower ratio.To generalize this result somewhat, let two random variables be related byY = ln Xor, equivalently, byand assume that Y is Gaussian with a mean ofY and a variance ai.By using (2-5) it is easy toshow that the probability density function ofX is1 [ (lnx- Y)2 ]fx(x)= � exp -2v 2rrayx 2ay= 0x::: 0(2-36)x<OThis is the log-normal probability density function. In engineering work base 10 is frequentlyused for Ule logarithm rather than base e, but it is simple to convert from one to the other. Sometypical density functions are sketched in Figure 2-19.The mean and variance of the log-normal random variable can be evaluated in the usualmanner and becomeX = exp (Y + �af)a;= [exp(af) - 1] exp 2 (Y + �ai)figure 2-19 The log-normalprobability density function.
- 98. 2-7 OTH ER PROBABILITY DENSITY F U N CT I O N S 87The distribution function for the log-normal random variable cannot be expressed in termsof elementary functions. If calculations involving the distribution function are required, it isusually necessary to carry out the integration by numerical methods.------------------------���---•• WH-�Exercise 2-6.5A log-normal random variable is generated by a Gaussian random variablehaving a mean value of 2 and a variance of 1 .a) Find the most probable value of the log-normal random variable.b) Repeat if the Gaussian random variable has the same mean value and .a variance of 6.Answers: 2.71 8, 0.01 832-7 Other Probability Density FunctionsIn addition to the density functions that are related to the Gaussian, there are many others thatfrequently arise in engineering. Some of these are described here and an attempt is made todiscuss briefly the situations in which they arise.Uniform DistributionThe uniform distribution was mentioned in an earlier section and used for illustrative purposes;it is generalized here. The uniform distribution usually arises in physical situations in whichthere is no preferred value for the random variable. For example, events that occur at randominstants of time (such as the emission of radioactive particles) are often assumed to occur attimes that are equaliy probable. The unknown phase angle associated with a sinusoidal sourceis usually assumed to be uniformly distributed over a range of 2Jr radians. The time positionof pulses in a periodic sequence of pulses (such as a radar transmission) may be assumed to beuniformly distributed over an interval of one period, when the actual time position with respectto zero time is unknown. All of these situations will be employed in future examples.The uniform probability density function may be represented generally as1f(x) = --- Xt < X S XzXz - X1= 0 otherwise(2-37)
- 99. 88 CHAPTER 2 · RANDOM VARIABLESIt is quite straightforward to show thatand- 1X = 2 (x1 + x2) (2-38)(2-39)The probability distribution function of a uniformly distributed random variable is obtainedeasily from the density function by integration. The result isFx (x) = 0 x =:s x1X - X1= X1 < X ::S Xz (2-40)Xz - X1= 1 x > x2One of the important applications of the uniform distribution is in describing the errorsassociated with analog-to-digital conversion. This operation takes a continuous signal that canhave any value at a given time instant and converts it into a binary number having a fixed numberofbinary.digits. Since a fixed number ofbinary digits can represent only a discrete set ofvalues,the difference between the actual value and the closest discrete value represents the error. Thisis illustrated in Figure 2-20. To determine the mean-square value of the error, it is assumedthat the error is uniformly distributed over an interval from - �x/2 to �x/2 where �x is thedifference between the two closest levels. Thus, from (2-38), the mean error is zero, and from(2-39) the variance or niean-square error is -f2 <�x)2•The uniform probability density function also arises quite naturally when dealing withsinusoidal time functions in which the phase is a random variable. For example, if a sinusoidalsignal is transmitted at one point and received at a distant point, the phase ofthe received signalis truly a random variable when the path over which signal travels is many wavelengths long.Since there is no physical reason for any one phase angle to be preferred over any other angle,the usual assumption is that the phase is uniformly distributed over a range of 2:1r . To illustratefigure 2-20 Error in analog-to-digitalconversion.Quantizinglevels
- 100. 2- 7 OTHER PROBABILITY DENSITY FU NCTIONS 89this, suppose we have a time function of the formx(t) = cos(wt - (})The phase angle (} is assumed to be a random variable whose probability density function is1fe(e) = -2n= 0 elsewhereFrom the previous discussion of the uniform density function, it is clear that the mean value ofe isand the variance of (} isJr2aJ = -3It should also be noted that one could have just as well defined the region over which e existsto be -Jrto +n, or any otherregion spanning 2n. Such a choice would not change the varianceof e at all, but it would change the mean value.Anotherapplication ofthe uniformprobability density function is in the generation ofsamplesof random variables having other probability density functions. The basis for this procedure isas follows. Let X be a random variable uniformly distributed over the interval (0, 1) and let Ybe a random variable with a probability distribution function Fy(y). It now desired to find afunction, q(x), such that the random variable Y = q(X) will have a probability dist,ribution ofthe form Fy (y). From the nature of probability density functions it follows that q(x) must bea monotonic increasing function of its argument and therefore if q(X) ::; q(x) it follows thatX :S x andFy (y) = Pr (Y :S y) = Pr [q(X) :S q(x)] = Pr (X :S x) = Fx(x)Solving for y givesy = Ff"1 [Fx(x)]However, X exists only over (0, 1) and in this region Fx(x) = x so the final result is(2-41 )From (2-41) it is seen that the transform from X to Y involves the inverse of the probabilitydistribution function of Y. As an example, suppose it is desired to generate samples of a randomvariable having a Rayleigh probability density function. The probability density and distributionfunctions are
- 101. 90 CHAPTER 2 · RANDOM VARIABLESr212 2fR(r) = -e-r " r =::: Oa2= 0 r < OFR(r) = 1 - e-212"2 r 2::: 0= 0 r < OFor purposes of illustrating the procedure let d = 2 givingFR(r) = 1 - e-r2/8Solving for r gives the inverse asFJ;1 (r) = J-8 In [1 - FR(r)]The desired transformation of the uniformly distributed random variable X is, therefore,Y = J-8 In ( 1 - X)The following MATLAB program uses this transformation to generate 10,000 samples of aRayleigh distributed random variable and plot the result as an approximation to the probabilitydensity function as described above.% Rayleigh.m compute samples of a Rayleigh distributionN=1 0000; %number of samplesM=SO; %number of histogram binsx = (1 ,N); % unif dist (0,1 )y=sqrt(8)*(-log(ones(1 ,N)-x)).A0.5; %transformed rv[p,q] = hist(y,M);bin = max(q)/M;pp=p/(N*bin);z=0.25*q.*exp(-. 1 25*q:2);bar(q,pp)hold onplot(q,z)%bin size%approx value of pdf%actual pdf at center of bins%plot approx to pdf%save bar graph%superimpose true pdfhold off %release holdxlabel(magnitude); ylabel(PDF AND APPROXIMATION)Figure 2-21 shows the true probability density function superimposed on the histogramapproximation. It is seen that the approximation is quite good. In Appendix G an example
- 102. 2-7 OTH ER PROBABILITY DENSITY FUNCTIONS 910.350.3 Iz0I� 0.25:i!x0 0.2a: I_a.a.I<( 0. 1 50z<( I� 0.1Ia.0.05�00 2 4 6 8 1 0 1 2MAGNITUDEF1gure 2-21 PDFs of approximation and true Rayleigh distributed random variables.of this procedure is considered in which an explicit expression for the inverse of the probabilitydistribution function does not exist and inverse interpolation is employed.Exercise 2-7.1A continuous signal that can assume any value between O V and + 1 O Vwith equal probability is converted to digital form by quantizing.a) How many discrete levels are required for the mean-square value of thequantizing error to be 0.01 V2?b) If the number of discrete levels is to be a power of 2 in order to efficientlyencode the levels into a binary number, how many levels are requiredto keep the mean-square value of the quantizing error not greater than0.01 V2?c) If the number of levels of part (b) are used, what is the actual meansquare quantizing error?Answers: 0.003, 29, 32
- 103. 92 CHAPTER 2 · RANDOM VARIABLESExponential and Related DistributionsIt was noted in the discussion of the uniform distribution that events occurring at random timeinstants are often assumedto occur at times that are equally probable. Thus, if the average timeinterval between events is denoted r, then the probability that an event will occur in a timeinterval M that is short compared to r is just D.t/r regardless of where that time interval is.From this assumption it is possible to derive the probability distribution function (and, hence,the density function) for the time interval between events.To carry out this derivation, consider the sketch in Figure 2-22. It is assumed that an event hasoccurredat time to, and it is desired to determine the probability that the next event will occur ata random time lying between to + -r and to + -r + M . If the distribution function for -r is F(<),then this probability is just F(< + D.t) - F(<). But the probability that the event occurred inthe D.t interval must also be equal to the product of the probabilities ofthe independent eventsthat the event did not occur between to and to + -r and the event that it did occur between to + <and to + -r + M. Sincel - F(-r) = probability that event did not occur between to and to + -rM- = probability that it did occur in l:!.tit follows thatF(< + M) - F(-r) = [ I - F(<)] (�)Upon dividing both sides by D. t and letting D. t approach zero, it is clear thatF(< + D.t) - F(<) dF(-r) llim = -- = = [ I - F(<)]t.t�o D.t d-r -rThe latter two terms comprise a first-order differential equation that can be solved to yieldF(<) = I - exp (�<) < :::: 0 (2-42)Figure 2-22 Time interval between events.
- 104. 2-7 OTH ER PROBABILITY DENSITY FUNCTIONS 93In evaluating the arbitrary constant, use is made of the fact that F(O) = 0 since r can never benegative.The probability density function for the time interval between events can be obtained from(2-42) by differentiation. Thus,1 (-r)/(r) = � expfr :::: 0(2-43)= 0 r < OThis is known as the exponential probability density function and is sketched in Figure 2-23 fortwo different values of average time interval.As would be expected, the mean value of ris just f. That is,[00 r ( -r )E[r] = Jo � exp f dr = fThe variance turns out to bea; = (f)2It may be noted that this density function (like the Rayleigh) is a single-parameter one. Thi.JS themean and variance are uniquely related and one determines the other.As an illustration of the application of the exponential distribution, suppose that componentfailures in a spacecraft occur independently and uniformly with an average time between failuresof 100 days. The spacecraft starts out on a 200-day mission with all components functioning.What is the probability that it will complete the_ mission without a component failure? This isequivalent to asking for the probability that the time to the first failure is greaterthan 200 days;this is simply [1 - F(200)] since F(200) is the probability that this interval is lessthilp. (or equalto) 200 days. Hence, from (2-42)1 - F(r) = 1 - [1 - exp {�r}J= exp{�r}and for f = 100, r = 200, this becomesFigure 2-23 Theexponential probabilitydensity function.f(-r)0
- 105. 94 CHAPTER 2 • RANDOM VARIABLES(-200 )1 - F(200) = exp -- = 0. 1352. 100As a second example of the application of the exponential distribution consider a travelingwave tube (TWT) used as an amplifier in a satellite communication system and assume that ithas a mean-time-to-failure (MTF) of 4 years. That is, the average lifetime of such a travelingwave tube is 4 years, although any particular device may fail sooner or last longer. Since theactual lifetime, T, is a random variable with an exponential distribution, we can determine theprobability associated with any specified lifetime. For ex�ple, the probability that the TWTwill survive for more than 4 years isPr (T > 4) = 1 - F(4) = 1 - (1 -e-414) = 0.368Similarly, the probability that the TWT will fail within the first year isPr (T :::; 1) = F(l) = 1 - e-114 = 0.221or the probability that it will fail between years 4 and 6 isPr (4 < T :::; 6) = F(6) - F(4) = (1 - e-614) -(1 - e-414) = 0. 1447Finally, the probability that the TWT will last as long as 10 years isPr (T > 10) = 1 - F(lO) = 1 - (1 - e-1014) = 0.0821The random variable in the exponential distribution is the time interval between adjacentevents. This can be generalized to make the random variable the time interval between any eventand the kth following event. The probability distribution for this random variable is known asthe Erlangdistribution and the probability density function is.k-1 exp(-•/f)fk(•) =("f)k(k - 1 ) != 0t � 0, k = 1 , 2, 3, . . . (2-44)Such a random variable is s;lid to be an Erlang random variable oforder k. Note that theexponential distribution is simply the special case for k = 1 . The mean and variance in thegeneral case are k"f and k("f)2, respectively. The general Erlang distribution has a great manyapplications in engineering pertaining to the reliability of systems, the waiting times for usersof a system (such as a telephone system or traffic system), and the number of channels requiredin a communication system to provide for a given number of users with random calling timesand message lengths.The Erlang distribution is also related to the gamma distribution by a simple change innotation. Letting fJ = 1 /"fand a be a continuous parameter that equals k for integral values, thegamma distribution can be written as
- 106. Z - 7 OTH ER PROBAB ILITY DENSITY FU NCTIONS13ar:a-1/(r:) = -- exp(-/3r:)r(a)= 0r: :=: 0r: < 0The mean and variance of the gamma distribution are a/f3 and a//32, respectively.Exercise 2-7.2A television set has a picture tube with a mean time to failure of 1 0,000hours. If the set is operated an average of 6 hours per day:a) What is the probability of picture tube failure within the first year?b) What is the probability of no failure within 5 years?Answers: 0.352, 0. 1 97Delta Distributions95�2-45)It was noted earlier that when the possible events could assume only a discrete set of values,the appropriate probability density function consisted of a set of delta functions. It is desirableto formalize this concept somewhat and indicate some possible applications. As an example,consider the binary waveform illustrated in Figure 2-24. Such a waveform arises in many typesofcommunication systems or control systems since it obviously is the waveform with the greatestaverage power for a given peak value. It will be considered in more detail throughout the studyof random processes, but the present interest is in a single random variable, X = x(t1), at aspecified time instant. This random variable can assume only two possible values, x1 or x2;it isspecified that it take on value X1 with probability Pl and value x2 with probability pz = 1 -Pl.Thus, the probability density function for X isf(x) = Pl 8(x -x1) + P28(x -x2)The mean value associated with this random variable is evaluated easily asX = 1_:x[p1 8(x -x1) + pz8(x - x2)]dx= P1X1 + P2X2The mean-square value is determined similarly from(2-46)
- 107. 96 CHAPTER 2 • RANDOM VARIABLESx(t)Hgure 2-24 A general binarywaveform.Hence, the variance isX2 = 1_:x2[p1 8(x -x1) + pzil(x -x2)dx= PtXf + p2xiai = X2 - (X)2 = P1Xf + pzxi - (p1X1 + P2X2)2= P1P2(X1 - X2)2in which use has been made of the fact that p2 = 1 - p1 in order to arrive at the final form.It should be clearthatsimilardeltadistributions exist for randomvariablesthatcan assume anynumber of discrete levels. Thus, if there are n possible levels designated as x1, xz, . . . , Xn , andthe corresponding probabilities for each level are p1, pz, . . . , Pn• then the probability densityfunction isnf(x) = Lp; 8(x -x;) (2-47)i=lin whichnBy using exactly the same techniques as above, the mean value of this random variable is shownto beand the m�an-square value isnX = L PiXii=l
- 108. 2-8 CONDITIONAL PROBABILITY DISTRIBUTION 97nx2= L,p;x;i = lFrom these, the variance becomesn (n )2ai = trp;xl - trp;x;The multilevel delta distributions also arise in connection with communication and controlsystems, and in systems·requiring analog-to-digital conversion. Typically the number of levelsis an integer power of 2, so that they can be efficiently represented by a set of binary digits.Exercise 2-7.3When three coins are tossed, the random variable is taken to be the numberof heads that result. Finda) the mean value of this random variableb} the variance of this random variable.Answers: 1 .5, 0.752-8 Conditional Probability Distribution and Density FunctionsThe concept of conditional probability was introduced in Section 1-7 in connection with theoccurrence of discrete ev�nts. In that context it was the quantity expressing the probability ofone event given that another event, in the sameprobability space, had already taken place. Itis desirable to extend this concept to the case of continuous random variables. The discussionin the present section will be limited to definitions and examples involving a single randomvariable. The case of two or more random variables is ·considered in Chapter 3.Thefirst step is to define the conditional probability distribution functionfora random variableX given that an event M has taken place. For the moment the event M is left arbitrary. Thedistribution function is denoted and defined by·
- 109. 98 C H A PT E R 2 · RANDOM VARIABLESF(xlM) = Pr [X :::; xlM]Pr {X :::; x, M}Pr (M)Pr (M) > 0(2.48)where { X :::; x. M } is the event of all outcomes ; such thatX(;) :::; x and ; E Mwhere X (;) is the value ofthe random variableX when the outcome ofthe experiment is g. Hence{ X :::; x, M } is the continuous counterpart of the set product used in the previous definition of( 1-17). It can be shown that F(xlM) is a valid probability distribution function and, hence, musthave the same properties as any other distribution function. In particular, it has the followingcharacteristics:I . 0 :::; F(xlM) :::; I - oo < x < oo2. F (-oolM) = 0 F(oolM) = 13. F(x I M) is nondecreasing as ;: increases4. Pr [x 1 < X :::; x2 I M ] = F(x2IM) - F(xt lM) � 0 for x1 < x2Now it is necessary to say something about the event M upon which the probability isconditioned. There are several different possibilities that arise. For example:1 . EventM maybe an event thatcan be expressed in termsoftherandom variableX. Examplesof this are considered in this section.2. Event M may be an event that depends upon some other random variable, which may beeither continuous or discrete. Examples of this are considered in Chapter 3.3. Event M may be an event that depends upon both the random variable X and some otherrandom variable. This is a more complicated situation that will not be considered at all.As an illustration of the first possibility above, let M be the eventM = {X :::; m}Then the conditional distribution function is, from (2-47),. Pr {X < x, X < m}F(x lM) = Pr {X :::; xlX :::; m} = - -Pr {X :::; m}There are now two possible situations-depending upon whetherx orm is larger. Ifx � m; thenthe event that X :::; m is contained in the event that X :::; x andPr {X :::; x, X :::; m} = Pr {X :::; m}Thus,
- 110. 2-8 CONDITIONAL PROBABILITY DISTRI BUTIONF(xlM) = Pr{X S m} =1Pr {X :::;: m}On the other hand, ifx :::;: m, then {X :::;: x} is contained in {X :::;: m} and .F(xlM) = Pr{X :::;: x} = F(x)Pr{X :::;: m} F(m)The resulting conditional distribution function is shown in Figure 2-25.99The conditional probability density function is related to the distribution function in the sameway as before. That is, when the derivative exists,f(xlM) = dF(xlM)dxThis also has all the properties of a usual probability density function. That is,1. f(xlM) .::: 0 - oo < x < oo2. £:f(xlM)dx = 13. F(xlM) = J_�f(ulM)du1X24. f(xJM)dx = Pr[x1 < X :::;: x2IM]x1If the example of Figure 2-25 is continued, the conditional probability density function isf(xJM) = -I_dF(x) = f(x) = mf(x)· F(m) dx F(m)[00f(x)dx= 0This is sketched in Figure 2-26.X < mx :::= mThe conditional probability density function can also be used to find conditional means andconditional expectations. For example, the conditional mean isfigure 2-25 A conditional probabilitydistribution function.
- 111. 1 00 CHAPTER 2 • RANDOM VARIABLESfigure 2-26 Conditional probability densityfunction corresponding to Figure 2-25.E[X I M] = L:xf(x l M) dxMore generally, the conditional expectation of any g(X) is0E[g(X) I M] = L:g (x)f(x l M) dxm(2-49)(2-50)As an illustration of the conditional mean, let the f(x) in the above example be Gaussianso thatI [ (x - x)2 ]f(x) =../2iiuexp -2u2To make the example simple, let m = X so thatThus1m=X I[ (x - X)2 ] IF(m) = -- exp - dx = --oo ../2iiu 2u2 2f(x) 2[ (x - X)2 ]f(x l M) = - = -- exp - .1/2 ../2iru 2u2= 0x < XHence, the conditional mean is ·1x 2x [ (x - X)2 ]E [x l M] = r;c exp -2dx· -oo v 2rru . 2u1° 2(u + X) ( u2 ) d= exp - - u-oo ../2iiu 2u2
- 112. 2-8 CONDITIONAL PROBABILITY DISTRI B UTION 1 0 1In words, this result says that the. expected value or conditional mean of a Gaussian randomvariable, given that the random variable is less than its mean, is justAs a second illustration ofthis formulation of conditional probability, let us consider anotherarchery problem. In this case, let the target be 12 inches in diameter and assume that the standarddeviation of the hit positions is 4 inches in both the X,direction and the Y-direction. Hence, theunconditional mean value of miss distance from the center ofthe target, for all attempts, includingthose that miss the target completely, is just R = 4./7il2 = 5.013 inches. We now seek to findthe conditional mean value of the miss distance given that the Arrow strikes the target. Hence,we define the event Mto be the event that the miss distance Ris less than or equal to six inches.Thus, the conditional probability density function appears asf(rJM) =f(r)F(6)Since the unconditional density function on Risr ( r2)f(r) =1 6exp32and the probability that Ris less than or equal to 6 isr � OF(6) = 1 -e-62132 = 0.675it follows that the desired conditional density function isr ( r2)f(r) =10.806exp -32r � OHence, the conditional mean value of the miss distance is16 r2 ( r2)E[RJM] = -- exp -- dr = 3.601 inches0 10.806 32in which the integration has been carried out numerically. Note that this value is considerablysmaller than the unconditional miss distance.Exercise 2-8.1A Gaussian random voltage having zero mean and a standard deviation of1 00 V is connected in series with a 50-Q resistor and an ideal diode. Find
- 113. 1 02 CHAPTER 2 · RANDOM VARIABLESthe mean value of the resulting current using the concepts of conditionalprobability.Answer: ·1 .5958Exercise 2-8.2A traveling wave tube has a mean-time-to-failure of 4 years. Given that theTWT has survived for 4 years, find the conditional probability that it will failbetween years 4 and 6.Answer: 0.39352-9 Examples and ApplicationsThe preceding sections have introduced some of the basic concepts concerning the probabilitydistribution and density functions for a continuous random variable. Before extending theseconcepts to more than one variable, it is desirable to consider a few examples illustrating howthey might be applied to simple engineering problems.As a first example, consider the elementary voltage-regulating circuit shown in Figure 2-27(a). It employs a Zener diode having an idealized current-voltage characteristic as shown inFigure 2-27(b). Note that current is zero until the voltage reaches the breakdown value (Vz = 10)and from then on is limited by the external circuit, while the voltage across the diode remainsconstant. Such a circuit is often used to limit the voltage applied to solid-state devices. Forexample, the RL indicated in the circuit may be a transistorized amplifier designed to work at9 V and that is damaged if the voltage exceeds 10 V. The supply voltage, Vs, is from a powersupply whose nominal voltage is 12 V, but whose actual voltage contains a sawtooth ripple and,hence, is a random variable. For purposes of this example, it will be assumed that this randomvariable has a uniform distribution over the interval from 9 to 15 V.Zener diodes are rated in terms of their ability to dissipate power as well as their breakdownvoltage. It will be assumed that the average power rating of this diode is Wz = 3 W. It is thendesired to find the value of series resistance, R, needed to limit the mean dissipation in the Zenerdiode to this rated value.When_ the Zener diode is conducting, the voltage across it is Vz = 10, and the current throughit iswhere the load current, h, is 1 A. The power dissipated in the diode is
- 114. 2-9 EXAMPLES A N D A P P L I CAT I O N SR I z++I : ILVL, = io · RL = i o0---------------------- Vv, = IOV(a) (b)Figure 2-27 Zener diode voltage regulator: (a) voltage-regulating circuit and (b) Zener di�dechara<;:teristic.Vz(Vs - Vz)Wz = Vzfz =R- h Vz=lOVs - 100_ lORV, > R + 101 03A sketch of this power as a function of the supply voltage Vs is shown in Figure·2-28, andthe probability density functions of Vs and Wz are shown in Figure 2-29. Note that the densityfunction of Wz has a large delta function at zero, since the diode is not conducting most of thetime, but is uniform for larger values of W since W, and V5 are linearly related in this range.From the previous discussion of transformations of density functions in Section 2-3, it is easyto show thatR (Rw )fw(w) = Fv (R + 10) 8(w) + 10Jv IO + R + 10= 050O < w < - - 1 0- - Relsewherewhere FvO is the distribution function of Vs. Hence, the area ofthe delta function is simplythe probability that the supply yoltage Vs is less than the value that causes diode conductionto start..figure 2-28 Relation between diode power dissipation andsupply voltage.IW, L___._____,._______ v.0 R + 10
- 115. 1 04fv(v)l60 9CHAPTER 2 · RANDOM VARIABLESR + 10I j I12 15(a)vJ - Fy CR + 10)R60 _____(b)Figure 2-29 Probability density functions for supply voltage and diode power dissipation: (a)probability density function for Vs and (b) probability density function for Wz.The mean value of diode power dissipation is now given byE[Wz] = Wz = 1_:wfw(w) dw100 100 (R ) I Rw )= wFv(R + l0) 8(w) dw + w - fv l - + R + lO dw-oo 0 10 10The first integral has a value ofzero (since the delta function is at w = 0) and the second integralcan be written in terms of the uniform density function1[fv (v) = 6 9 < v � 15], asW =[<50/R)-IOw(!!__)(�) dw = (5 - R)2z )0 10 6 l.2RSince the mean value ofdiode power dissipation is to be less than or equal to 3 watts, it followsthatfrom which_(5 - R)2< 3l.2R -R � 2. 19 QIt may now be concluded that any value of R greater tha.n 2.19 Q would be satisfactory fromthe standpoint of limiting the mean value of power dissipation in the Zener diode to 3 W. Theactual choice of R would be determined by the desired value of output voltage at the nominalsupply voltage of 12 V. If this desired voltage is 9 V (as suggested above) then R must be
- 116. 2-9 EXAMPLES AND APPLICATIONS 1053R = 9 = 3.33 Q10which is greater than the minimum value of 2.19 Q and, hence, would be satisfactory.As another example, consider the problem ofselecting a multiplier resistor for a de voltmeteras shown in Figure 2-30. It will be assumed that the de instrument produces full-scale deflectionwhen 100 µA is passing through the coil and has a resistance of 1000 Q. It is desired to selecta multiplier resistor R such that this instrument will read full scale when 10 V is applied. Thus,the nominal value ofR to accomplish this (which will be designated as R*) isR* = _!Q_ - 1000 = 9.9 x 104 Q10-4However, the actual resistor used will be selected at random from a bin of resistors marked105 Q. Because of manufacturing tolerances, the actual resistance is a random variable having amean of 105 and a standard deviation of 1000 Q. It will also be assumed thatthe actual resistanceis a Gaussian random variable. (This is a customary assumption when deviations around themean are small, even though it can never be precisely true for quantities·that must be alwayspositive, like the resistance.) Onthe basis ofthese assumptions it is desiredto find theprobabilitythat the resulting voltmeter will be accurate to within 2%.6The smallest value ofresistance that would be acceptable iswhile the largest value is. Rmin = 10 -�·2- 1000 = 9.7 x 10410-10 + 0.2 4Rmax = 4 - 1000 = 10.1 X 1010-The probability that a resistor selected at random will fall between these two limits is110.lxl04Pc = Pr [9.7 x 104 < R ::::: 10.1 x 104] = fR(r) dr9.7x!04+ RvRm = 1000 0(2-51)Dgure 2-30 Selection of a voltmeter resistor. 100 µAde instrument611ris is interpreted to mean that the error in voltmeter reading due to the resistor value is less than orequal to 2% of the full scale reading.
- 117. 106 CHAPTER 2 • RANDOM VARIABLESwhere fR(r) is the Gaussian probability density function for R and is given by1 [ (r - 105)2 ]fR(r) =5(1000)exp -2(106)The integral in (2-5 1) can be expressed in terms ofthe standard normal distribution function,<I> (·), as discussed in Section 2-5. Thus, Pc becomes- ( 10. 1 x 104 - 105 )-( 9.7 x 1 04 - 105 )Pc -<I>lQ3 .<I>l Q3which can be simplified toPc = <1>(1) - <l> (-3)= <l>(l) - [1 - <1> (3)]Using the tables in Appendix D, this becomesPc = 0.8413 - [1 - 0.9987] = 0.8400Thus, it appears that even though the resistors are selected from a supply that is nominallyincorrect, there is still a substantial probability that the resulting instrument will be withinacceptable limits of accuracy. .Thethird example considers an application ofconditional P.robability. This example considersa traffic measurement system that is measuring the speed of all vehicles on an expressway andrecording those speeds in excess of the speed limit of 70 miles per hour (mph). If the vehiclespeed is a random variable with a Rayleigh distribution and a most probable value equal to 50mph, it is desired to find the mean value of the excess speed. This is equivalent �o finding theconditional mean ofvehicle speed, given that the speed is greater than the limit, and subtractingthe limit from it.Letting the vehicle speed be S, the conditional distribution function that is sought isF[s l S > 70] =Pr {S ::: s, S > 70}Pr {S > 70}Since the numerator is nonzero only when s > 70, (2-52) can be written asF[s l S > 70] = 0 s ::: 70F(s) - F (70)= -----1 - F(70)s > 70(2-52)(2-53)where FO is the probability distribution function for the random variable S. The numeratorof (2-53) is simply the probability that S is between 70 and s, while the denominator is theprobability that S is greater than 70. .
- 118. 2-9 EXAMPLES AND APPLICATIONS 107The conditional probability density function is found by differentiating (2-53) with respectto s. Thus,f(slS > 70) = 0 s -::::. 70f(s)= ----1 - F(70)s > 70where f(s) is the Rayleigh density function given byf(s) =(5�)2exp[-2(;�)2J= 0s 2: 0S < OThese functions are sketched in Figure 2-3 1 .The quantity F(70) is easily obtained from (2-54) asHence,.110 s[ 82J [49]F(70) = -- exp - --· ds = 1 - exp - -0 (50)2 2(50)2 501 - F(70) = exp [-:�JThe conditional expectation is given byE[SI S > 70] =exp[-�9/50] i; (;�)2exp -[2c;�)2 Jds= 70 + 50hrr exp[:�J{I-� (�)}= 70 + 27.2Figure 2-31 Conditional and unconditionaldensity functions for a Rayleigh-distributedrandom variable.(2-54)
- 119. 108 CHAPTER 2 • RANDOM VARIABLESThus, the mean value of the excess speed is 27.2 miles per hour. Although it is clear from thisresult that the Rayleigh model is not a realistic one for traffic systems (since 27.2 miles per hourexcess speed is much too large for the actual situation), the above example does illustrate thegeneral technique for finding conditional means.The final example in this section combines the concepts of both dis:.:rete probability andcontinuous random variables and deals with problems that might arise in designing a satellitecommunication system. In such a system, the satellite normally carries a number of travelingwave tubes in order to provide more channels and to extend the useful life of the system as thetubes begin to fail. Consider a case in which the satellite is designed to carry 6TWTs and it isdesired to require that after 5years of operation there is a probability of 0.95that at least one ofthe TWTs is still good. The quantity that we need to find is the mean time to failure (MTF) foreach tube in order to achieve this degree of reliability. In order to do this we need to use someof the results discussed in connection with Bernoulli trials in Sec. 1-10.In this case, let kbe thenumber of good TWTs at any point in time and let pbe the probability that any TWT is good.Since we want the probability that at least one tube is good to be 0.95,it follows thatorPr (k ..: 1)= 0.95tP6(k) = 1 - P6(0) = 1 - (6)p0(1- p)6 = 0.95k=IOwhich can be solved to yield p = 0.393.If we assume, as usual, that the lifetime of any oneTWT follows an exponential distribution, then100 1 -= e-r/Tdr: = 0.3935 TT = 5.353Thus, the mean time to failure for each TWT must be at least 5.353·years in order to achievethe desired reliability.A second question that might be asked is "How many TWTs would be needed to achieve aprobability of 0.99that .at least one will be still functioning after 5 years?" In this case, n isunknown but, for TWTs having the same MTF, the value ofp is still 0.393.Thus,1- Pn (O) = 0.99(�)p0(1- p)n = 0.01This may be solved for n to yield n = 9.22.However, since n must be an integer, this tells usthat we must use at least 10traveling wave tubes to achieve the required reliability.
- 120. PROBLEMSExercise 2-9.1The current in a semiconductor diode is often modeled by the ShockleyequationI = Io[e11v - 1]in which V is the voltage across the diode, Io is the reverse current, T/ is aconstant that depends upon the physical diode and the temperature, andI is the resulting diode current. For purposes of this exercise, assume thatIo = 10-9 and T/ = 12. Find the resulting mean value of current if the diodevoltage is a random variable that is uniformly distributed between 0 and 2.Answer: 1 .1 037Exercise 2-9.2A Thevenins equivalent source has an open-circuit voltage of 8 V and asource resistance that is a random variable that is uniformly distributedbetween 2 n and 1 o n. Finda) the value of load resistance that should be connected to this sourcein order that the mean value of the power delivered to the load is amaximumb) the resulting mean value of power.Answers: 4.47, 3.06109PROBLEMS -------------------2-t . t For each of the following situations, list any quantities that might reasonably beconsidered arandomvariable, statewhetherthey are continuous or discrete, and indicatea reasonable range of values for each.a) A weather forecast gives the prediction for July 4th as high temperature, 84; lowtemperature, 67; wind, 8 mph; humidity, 75%; THI, 72; sunrise, 5:05 am; sunset,8:45 pm. .b) A traffic survey on a busy street yields the following values: number of vehicles perminute, 26; average speed, 35 mph; ratio of cars to trucks, 6.8 1 ; average �eight,4000 lb; number of accidents per day, 5.
- 121. 1 1 0 CHAPTER 2 • RANDOM VARIABLESc) An electronic: circuit contains 15 ICs, 12 LEDs, 43 resistors, and 12 capacitors. Theresistors are all marked 1000 n, . the capacitors are all marked 0.01 µ,F, and thenominal supply voltage for the circuit is 5 V.2-1 .2 State whether each of the following random variables is continuous or discrete andindicate a reasonable range of values for each.a) The outcome associated with rolling a pair of dice.b) The outcome resulting from measuring the voltage of a 12-V storage battery.c) The outcome associated with randomly selecting a telephone number from thetelephone directory.d) The outcome resulting from weighing adult males.2-2. 1 When 10 coins are flipped, the event ofinterest is the number ofheads. Let this numberbe the random variable.a) Plot the distribution function for this random variable.·b) What is the probability that the random variable is between six and nine inclusive?c) What is the probability that the random variable is greater than or equal to eight?2-2.2 A random variable has a probability distribution function given byFx (x) = 0= 0.5 + 0.5x= 1a) Find the probability that x = i.b) Find the probability that x > i.- OO < x ::; - 1- l < x < ll ::; x < ooc) Fii:J.d the probability that -0.5 < x ::: 0.5.2-2.3 A probability distribution function for a random variable X has the formFx (x) = A { l - exp[- (x - 1)]} 1 < x < oo= 0 - oo < x ::: l
- 122. PROBLE.MS 1 1 1a) "for what value of A is this a valid probability distribution function?b) What is Fx (2)?c) What is the probability that the random variable lies in the interval 2 < X< oo?d) What is the probability that the random variable lies in the interval 1 < X :::: 3?2-2A A random variable X has a probability distribution function of the formFx (x) = 0= A(l + cos bx)= 1- 00 < x :::: -22 < X < OOa) Find the values of A and b that make this a valid probability distribution function.b) Find the probability.that X is greater than 1.c) Find the probability that X is negative.2-3.1 .a) Find the probability density function of the random variable of Problem 2-2. 1 andsketch it.·b) Using the probability density function, find the probability that the random variableis in the range between four and seven inclusive.c) Using the probabilicy density function, find the probability that the random variableis less than four.2-3.2 a) Find the probability density function of the random variable of Problem 2-2.3 andsketch it.·b) Using the probability density function, find the probability that the random variableis in the range 2 < X :::: 3.c) Using the probability density function, find the probability that the random variableis less than 2.2-3.3 a) A random variable X has a probability density function of the formfx(x) = exp(-21x l) - OO < X < OO
- 123. 1 1 2 CHAPTER 2 · RANDOM VARIABLESA second random variable Y is related to X by Y = X2• Find the probability densityfuncti,on of the random variable Y.b) Find the probability that Y is greater than 2.2-3A a) A random variable Y is related to the random variable X of Problem 2-3.3 byY = 3X ·- 4. Find the probability density function of the random variable Y.b) Find the probability that Y is negative.c) Find the probability that Y is greater than X.2-4. 1 For the random variable of Problem 2-3.2 finda) the mean value of Xb) the mean-square value of Xc) the variance of X.2-4.2 For the random variable X of Problem 2-2.4 finda) the mean value of Xb) the mean-square value of Xc) the third central moment of Xd) the variap.ce of X.2-4.3 A random variable Y has a probability density function of the formf(y) = Ky= 00 < y :S: 6elsewherea) Find the value of K for which this is a valid probability density function.b) Find the mean value of Y.c) Find the mean-square value of Y.d) Find the variance of Y.e) Find the third ce�tral moment of Y.
- 124. PROBLEMS 1 1 3t) Find the nth moment, E[Yn].2-4A A power supply has five intermittent loads connected to it �d each load, when inoperation, draws a power of 10 W. Each load is in operation only one-quarter of thetime and operates independently of all other loads.a) Find the mean value of the power required by the loads.b) Find the variance of the power required by the loads.c) If the power supply can provide only 40 W, find the probability that it will beoverloaded.2-4.5 A random variable X has a probability density function of the formfx(x) = ax2= axa) Find the value of a.b) Find the mean of the random variable X.c) Find the probability that 2 < x ::::: 3.2-5.1 � Gaussian random voltage has a mean value of 5 and a variance of 16.a) What is the probability that an observed value of the voltage is greater Qian zero?b) What is the probability that an observed value ofthe voltage is greater than zero butless than or equal to the mean value?c) What is the probability that an observed value of the voltage is greater than twicethe mean value?2-5.2 For the Gaussian random variable of Problem 2-5.1 finda) the fourth central momentb) the fourth moment.c) the third central momentd) the third moment.
- 125. 1 1 4 CHAPTER 2 · RANDOM VARIABLES2-5.3 A Gaussian random current has a probability of0.5 of having value less than or equalto 1 .0. It also has a probability of 0.0228 of having a value greater than 5.0.a) Findthe mean value ofthis random variable.b) Find the variance of this random variable.c) Find the probability that the random variable has a value less than or equal to 3.0.2-5A Make a plot of the function Q(o:) over the range o: = 5 to 6. On the same plot showthe approximation as given by equation (2-25).2-S·.5 A common ·method for detecting a signal in the presence of noise is to establish athreshold level and compare the value of any observation with this threshold. If thethreshold is exceeded, it is decided that signal is present. Sometimes, ofcourse, noisealone will exceed the threshold and this is known as a "false alarm." Usually, it isdesired to make the probability ofafalse alarm very small. Atthesametime, we wouldlike any observation that d0es contain a signal plus the noise to.exceed the thresholdwith a large probability. This is the probability of detection and should be as close to1 .0 as possible. Suppose we have Gaussian noise with zero mean and a variance of 1y2 and we set a threshold level of 5 V.a) Find the probability of false alarm.b) If a signal having a value of 8 V is observed in the presence of this noise, find theprobability of detection.2-5.6 A Gauss.ian random variable has a mean of I and a variance of4.a) Generate a histogram of samples ofthis random variable using 1000 samples.b) Make a histogram ofthe square of this random variable using 1000 samples and 20bins. Modify the amplitude of the histogram to approximate the probability densityfunction.2-6. 1 A Gaussian random current having zero mean and a variance of4 A2 is passed througha resistance of 3 Q.a) Find the mean value ofthe power dissipated.b) Find the variance of the power dissipated.c) Find the probability that the instantaneous power will exceed 36 W.
- 126. PROBLEMS 1 1 52-6.2 A random variable X is Gaussian with zero mean and a variance of 1 .0. Anotherrandomvariable, Y, is defined by Y = X3•a) Write the probability density function for the random variable Y.b) Find the mean value of Y.c) Find the variance of Y.2-6.3 A current having a Rayleigh probability density function is passed through a resistorhaving a resistance of 2:ir n. The mean value of the current is 2 A.a) Find the mean value of the power dissipated in the resistor.b) Find the probability that the dissipated power is less than or equal to 12 W.c) Find the probability that the dissipated power is greater than 72 W.2-6A Marbles rolling on a fiat surface have components of velocity in orthogonal directions that are independent Gaussian random variables with zero mean and a standarddeviation of 4 ft/s.a) Find the most probable speed of the marbles.b) Find the mean value of the speed.c) What is the probability of finding a marble with a speed greater than IO ft/s?2-6.5 The average speed of a nitrogen molecule in air at 20°C is about 600 mis. Find:a) The variance of molecule speed.b) The most probable molecule speed.c) The rms molecule speed.2-6.6 Five independent observations of a Gaussian random voltage with zero mean and unitvariance are made and a new random variable X2 is formed.from the sum of the squaresof these random voltages.a) Find the mean value of X2•b) Find the variance of X2•
- 127. 1 1 6 CHAPTER 2 • RANDOM VARIABLESc) What is the most probable value of X2?2-6.7 The log-normal density function is often expressed in terms of decibels rather thannepers. In this case, the Gaussian random variable Yis related to the log-normal randomvariable by Y = 10 log10 X.. a) Write the probability density function for X when this relation is used.b) Write an expression for the mean value ofX.c) Write an expression for the variance ofX.2-7. 1 A random variable 8 is uniformly distributed over a range of O to 21l. Apother randomvariable xis related to e byX = cos 8a) Find the probability density function of X.b) Find the mean value of X.c) Find the variance of X.d) Find the probability that X > 0.5.2-7.2 A continuous-valued random voltage ranging between -20 V and +20 V is to bequantized so that it can be represented by a binary sequence.·a) Ifthe rms quantizing error is to be less than 1 % ofthe maximum value ofthe voltage,find the minimum number of quantizing levels that are required.b) If the number of quantizing levels is to be a power of 2, find the minimum numberof quantizing levels that will still meet the requirement.c) How many binary digits are required to represent each quantizing level?2-7.3 A communications satellite is designed to have a mean time to failure (MTF) of6 years.If the actual time to failure is a random variable that is exponentially distributed, finda) the probability that the satellite will fail sooner than six.yearsb) the probability that the satellite will survive for 10 years or morec) the probability that the satellite will fail during the sixth year.
- 128. PROBLEMS 1 1 72-7A A homeowner buys a package containing four light bulbs, each specified to have anaverage lifetime of 2000 hours. One bulb is placed in a single bulb table lamp and theremaining bulbs are used one after another to. replace ones that bum out in this samelamp.a) Find the expected lifetime of the set of four light bulbs.b) Find the probability that the four light bulbs will last 10,000hours or more.c) Find the probability that the four light bulbs will all bum out in 4000 hours or less.2-7.5 A continuous-valued signal has a probability density function that is uniform over therange from -8 V to +8 V. It is sampled and quantized into eight equally spaced levelsranging from -7 to +7.a) Write the probability density function for the discrete random variable representingone sample.b) Find the mean value of this random variable.c) Find the variance of this random variable.2-8. 1: a) For the communication satellite system of Problem 2-7.3, find the conditionalprobability that the satellite will survive for 10 years or more given that it hassurvived for 5 years.b) Find the conditional mean lifetime of the system given that it has survived for 3years.2-8.2 a) For the random variable X ofProblem 2-7. 1 , find the conditional probability densityfunction f(x J M), where M is the event 0 :=:: () :=:: � · Sketch this density function.b) Find the conditional mean E[X JM], for the same event M.2-8.3 A laser weapon is fired many times at a circular target that is 1 m in diameter and it isfound that one-tenth of the shots miss the target entirely.a) For those shots that hit the target, find the conditional probability that they will hitwithin 0.1 m of the center.b) For those shots that miss the target completely, find the conditional probability thatthey come within 0.3 m of the edge of the target.2-8.4 Consider again the threshold detection system described in Problem 2-5.5.
- 129. 1 18 CHAPTER 2 · RANDOM VARIABLESa) When noise only is present, find the conditional mean value ofthe noise that exceedsthe threshold.b) Repeat part (a) when both the specified signal and noise are present.2-9. 1 Different types of electronic ac voltmeters produce deflections that are proportional todifferent characteristics of the applied waveforms. In most cases, however, the scaleis calibrated so that the voltmeter correctly indicates the rms value of a sine wave.For other types of waveforms, the meter reading may not be equal to the rms value.Suppose the following instruments are connected to a Gaussian randomvoltage havingzero mean and a standard deviation of 10 V. What will each read?a) An instrument in which the deflection is proportional to the average of the fullwave rectified waveform. That is, if X (t) is applied, the deflection is proportionalto E[IX (t) IJ.b) An instrument in which the deflection is proportional to the average ofthe envelopeof the waveform. Remember that the envelope of a Gaussian waveform has aRayleigh distribution.·2-9.2 In a radar system, the reflected signal pulses may have amplitudes that .are Rayleighdistributed. Let the mean value of these pulSes be .fiC/2. However, the only pulses thatare displayed on the radar scope are those for which the pulse amplitude R is greaterthan some threshold ro in order that.the effects of system noise can be supressed.a) Determine the probability density function of the displayed pulses; that is, findf(rlR > ro). Sketch this density function.b) Find the conditional mean of the displayed pulses if r0 = 0.5.· 2-9.3 A limiter has an input-output characteristic defined byVout = - BB Vin=A- A < Vin · < A= B Vin > Aa) If the input is a Gaussian random variable V with a mean value of V and a varianceof ai, write a general expression for the probability density function of the output.b) If A = B = 5 and the input is uniformly distributed from - 2 to 8, find the meanvalue of the output.
- 130. REFERENCES H 9l-9A Let the input to the limiter of Problem 2-9.3(b) beV (t) = 10 sin(wt + 8)where 8 is a random variable that is uniformly distributed from 0 to 2rr . The outputcif the limiter is sampled at an arbitrary time t to obtain a random varaible V1 •a) Find the probability density function of V1•b) Find the mean value of V1•c) Find the variance of Vr .2-9.5 As an illustration of the central limit theorem generate a sample of 500 randomvariables each of which is the sum of 20 other independent random variables having anexponential probability density function of the form f( x) = exp (-x)u (x). Normalizethese random variables in accordance with equation (2-28) and make a histogram oftheresult normalized to approximate the probability density function. Superimpose onthe histogram theplot of a Gaussian probability denshy function having zero mean andunit variance.ReferencesSee references for Chapter l, particularly Clarke and Disney, Helstrom, and Papoulis.
- 131. CHAPTER 3----------3-1 Two Random VariablesSeveral RandomVariablesAll of the discussion so far has concentrated on situations involving a single random variable.This random variable may be, for example, the value of a voltage or current at aparticularinstantof time. It should be apparent, however, that saying something about a random voltage or currentat only one instant of time is not adequate as a means of describing the nature of complete timefunctions. Such time functions, even if of finite duration, have an infinite number of randomvariables associated with them. This raises the question, therefore, of how one can extend theprobabilistic description of a single random variable to include the more realistic situation ofcontinuous time functions. The purpose ofthis section is to takethe first step ofthat extension byconsidering two random variables. It might appear that this is an insignificant advance towardthe goal of dealing with an infinite number of random variables, but it will become app;n:entlater that this is really all that is needed, provided that the two random variables are separatedin time by an arbitrary time interval. That is, if the random variables associated with any twoinstants of time can be described, then all of the information is available in order to carry outmost of the usual types of systems analysis. Another situation that can arise in systems analysisis that in which it is desired to find the relation between the input and output ofthe system, eitherat the same instant of·time or at two different time instants. Again, only two random variablesare involved.To deal with situations involving two randomvariables, it is necessary to extend the concepts .of probability distribution and density functions that were discussed in the last chapter. Letthe two random variables be designated as X and Y and define a joint probability distributionfunction asF(x, y) =: Pr [X ::: x, Y ::: y]120
- 132. 3- 1 TWO RANDOM VARIABLES 121�ote that this is simply the probability of the event that the random variable X is less than orequal to x andthat the random variable Yis less than or equal to y. As such, it is a straightforwardextension of the probability distribution function for one random variable.The joint probability distribution function has properties that are quite analogous to thosediscussed previously for a single variable. These may be summarized as follows:1 . 0 � F(x, y) � 1 - oo < x < oo - oo < y < oo2. F(-oo, y) = F(x, -oo) = F(-oo, -oo) = 03. F(oo, oo) = 14. F(x, y) is a nondecreasing function as either x or y, or both, increase5. F(oo, y) = Fy(y) F(x, oo) = Fx(x)In item 5 above, the subscripts on Fy(y) and Fx(x) are introduced to indicate that these twodistribution functions are not necessarily the same mathematical function of their respectivearguments.As an example of joint probability distribution functions, consider the outcomes of tossingtwo coins. Let X be a random variable associated with the first coin; let it have a value of 0 if atail occurs and a value of 1 if a head occurs. Similarly let Y be associated with the second coinand also have possible values of 0 and 1. The joint distribution function, F(x, y), is shown inFigure 3-1. Note that it satisfies all of the properties listed above.It is.also possible to define ajointprobabilitydensityfunctionby differentiating the distributionfunction. Since there are two independent variables, however, this differentiation must be donepartially. Thus,!()_ a2F(x, y)x, y -ax ay(3-1)and the sequence of differentiation is immaterial. The probability element isyF(x, yll4 1--,-:-"-----,,1figure 3-1 A joint probabilitydistribution function.
- 133. 122 CHAPTER 3 • SEVERAL RANDOM VARIABLESf(x, y) dx dy = Pr [x < X ::=: x + dx, y < Y ::=: y + dy] (3-2)The properties ofthejoint probability density function are quite analogous to those ofa singlerandom variable and may be summarized as follows:1. f(x, y) � 0 - oo < x < oo - oo < y < oo2. 1_:1_:f (x , y) dx dy = 13. F(x, y) = J_�f:00f(u, v) dv du4. fx (x) = 1_:f(x, y) dy fy (y) = 1:f(x, y) dx5. Pr [x1 < X :S X2, YI < Y :S Y2] = r2 f2f(x, y) dy dxlx1 )YINote that item 2 implies that the volume beneath anyjoint probability density function must beunity.As a simple illustration ofajoint probability density function, consider a pair ofrandom variables having a density function that is constant between x1 and x2 and between Y1 and Y2· Thus,f(X , y) =1 { X1 < X :S X2(x2 - xi) (y2 - Y1 ) YI < Y :S Y2= 0 elsewhere (3-3)This density function and the corresponding distribution function are shown in Figure 3-2.A physical situation in which such a probability density function could arise might be inconnectionwiththemanufactureofrectangularsemiconductorsubstrates. Eachsubstratehastwodimensions and the values ofthe two dimensions might be random variables that are uniformlydistributed between certain limits.f(x, y)I II I,- - - ��ll��---y (a) yfigure 3-2 (a) Joint distribution and (b) density functions.II II(b)
- 134. 3- 1 TWO RANDOM VARIABLES 123The Joint probability density function can be used to find the expected value of functions oftwo random variables in much the same way as with the single variable density function. Ingeneral, the expected value of any function, g(X, Y),can be found fromE[g(X, Y)] = 1_:1_:g(x, y)f(x, y)dxdy (3-4)One such expected value that will be considered in great detail in a subsequent section ariseswhen g(X, Y) = XY. This expected value is known as the corrt:lation and is given byE[XY] = 1_:1_:xyf(x, y)dxdy (3-5)As a simple example of the calculation, consider the joint density function shown in Figure3-2(b). Since it is zero everywhere except in the specific region, (3-4) may be written aslx2 1Y2 [ 1JE[XY] = dx . xy dyx1 y1 (xz -x1)(y2 - Y1)1 [x2 ,x2][y2 ,Y2](xz -x1)(y2 - Y1) 2 x1 2 y11=4Cx1 +x2)(y1 +Yz)Item 4 in the above list of properties ofjoint probability density functions indicates that themarginal probability density functions can be obtained by integrating thejoint density over theother variable. Thus, for the density function in Figure 3-2(b), it follows that·(3-6a)= ---Xz -X1and(3-6b)= ---Yz - y1
- 135. 1 24 C H A PT E R 3 · S EV E RA L RA N D O M VA RI A B L ESExercise a�1 .1Consider a rectangular semiconductor substrate with dimensions havingmean values of 1 cm and 2 cm. Assume that the actual dimensions inboth directions are normally distributed around the means with standarddeviations of 0 . 1 cm. Finda) the probability that both dimensions are larger than their mean valuesby 0.05 cmb) the probability that the larger dimension is greater than its mean valueby 0.05 cm and the smaller <:limension is less than its mean value by0.05 cmc) the mean value of the area of the substrate.Answers: 0.2223, 0.0952, 2Exercise 3-1 .2Two random variables X and Y have a joint probability density functiongiven byf(x, y) = Ae-<2x+3yl= 0Findx � 0, y � 0x < 0, y < 0a) the value of A for which this is a valid joint probability pensity function.b) the probability that X < 1 /2 and Y < 1 /4c) the expected value of XY.Answers: 0. 1 667, 6, 0.33353-2 Conditional Probability-RevisitedNow that the concept ofjoint probability for two random variables has been introduced, it ispossible to extend the previous discussion of conditional probability. The previous definitionof the conditional probability density function left the given event M somewhat arbitrary-
- 136. 3 - Z CO N D IT I O N A L P RO B A B I L ITY- REV I S I T E D 1 25although some specific examples were given. In the present discussion, the event M will berelated to another random vanable, Y.There are several different ways in which the given event Mcan be defined in terms of Y. Forexample, Mmight be the event Y S yand, hence, Pr (M) would bejust the marginal distnbution·function of Y-that is, Fy(y). From the basic definition of the conditional distribution h:lctiongiven in (2-48) of the previous chapter; it would follow thatFx(xlY < y) _Pr[X S x, M] _ _F_(x_, y_)--Pr (M) -Fy(y) (3-7)Another possible definition of M is that it is the event Yi < Y S yz. The definition of (2-48)now leads toF(x , Y2) � F(x, Yi)Fx(xlyi < Y < V?) = -------- , -Fy (y2) - Fv (Yi )(3-8)In both of the above situations, the event Mhas a nonzero probability-that is, Pr (M) > 0.However, the most common form of conditional probability is one in which Mis the event thatY = y; in almost all these cases Pr (M) = 0, since Y is continuously distributed. Since theconditional distribution function is defined as a ratio, it usually still exists even in these cases.It can be obtained from (3-8) by letting y1 = y and Y2 = y + Ll.y and by taking a limit as Ll.yapproaches zero. Thus,. F(x, y + Ll.y) - F(x, y) aF(x, y)/ayFx(xlY = y) = hm = ----t.y�o Fv(Y + Ll.y) - Fy(y) aFv(Y)/ay1_:f(u, y)du- ·fy(y)The corresponding conditional density function isaFx(xlY = y) f(x, y)fx(xlY = y) = = --ax fy(y)(3-9)(3-10)and this is the form that is most commonly used. By interchanging Xand Y it follows thatf(x, y)fy(ylX = x) = -fx(x) (3-1 1)Because this form of conditional density function is so frequently used, it is convenient toadopt a shorter notation. Thus, when there is no danger of ambiguity, the conditional densityfunctions will be written asf(xly) = f(x, y)fy(y) (3-12)
- 137. 1 26 C H A PTER 3 • S E V E RA L RAN DOM VA RI A B L ES!( I ) =f(x , y)y xfx (y)(3-13)From these two equations one can obtain the continuous version of Bayes theorem, which wasgiven by (1-21) for the discrete case. Thus, eliminating f(x, y) leads directly tof(ylx) =f(x ly)fy(y)fx (x)(3-14)It is also possible to obtain another expression for the marginal probability density functionfrom (3-12) or (3-13) by noting thatfx(x) = 1_:f(x, y) dy = 1_:f(x ly)fy(y) dy· (3-15)andfy (y) = 1_:f(x, y) dx = 1_:f(ylx)fx (x) dx (3-16)These equations are the continuous counterpart of (1-20), which applied to the discrete case.A point that might be noted in connection with the above results is that the joint probabilitydensity function completely specifies both marginal density functions and both conditionaldensity functions. As an illustration of this, consider a joint probability density function ofthe form6f(x, y) = 50 -x2y) 0 :::: x :::: l, 0 :::: y :::: 1= 0 elsewhereIntegrating this function with respect to y alone and with respect to x alone yields the twomarginal density functions asand6 ( x2 )· fx (x) = 5 1 -2fy(y) = � (1 -�). 5 30 :::: x :::: 1From (3-12) and (3-13) the two conditional density functions may now be written as1 - x2yf(x !y) = y1 - -30 :::: x :::: 1, 0 :::: y :::: 1
- 138. and3-2 CONDITIONAL PROBABILITY-REVISITED1 - x2yf(y lx) = 2x1 - -20 :::: x :::: 1 , 0 :::: y :::: 1127The use of conditional density functions arises in many different situations, but one of themost common (and probably the simplest) is that in which some observed quantity is the sum oftwo quantities-one of which is usually considered to be a signal while the other is consideredto be a noise. Suppose, for example, that a signal X (t) is perturbed by additive noise N(t) aridthat the sum of these two, Y(t), is the only quantity that can be observed. Hence, at some timeinstant, there are three random variables related byand it is desired to find the conditional probability density function of X given the observedvalue of Y-that is, f(x ly). The reason for being interested in this is that the most probablevalues of X, given the observed value Y, may be a reasonable guess, or estimate, of the truevalue of X when X can be observed only in the presence of noise. From Bayes theorem thisconditional·probability isf(x ly) = f(y lx)fx (x)fy (y)But ifX is given, as implied by f(y lx), then the only randomness about Y is the noiseN, and itis assumed that its density function, fN (n), is known. Thus, since N = Y - X, and X is given,f(y lx) = fN (n = y - x) = fN (Y - x)The desired conditional probability density, f(x Iy), can now be written asf(x ly) =fN (Y - x)fx (x)=.00fN (Y - x)fx (x)fy (y)f_00fN (Y - x)fx (x) dx(3-17)in which the integral in the denominator is obtained from (3-16). Thus, if the a priori densityfunction of the signal, fx (x), and the noise density function, fN (n), are known, it becomespossible to determine the conditional density function, f(x ly). When some particular value ofY is observed, say y1 , then the value of x for which f(x ly1 ) is a maximum is a good estimatefor the true value ofX.As a specific example of the above application of conditional probability, suppose that thesignal random variable, X, has an exponential density function so thatfx (x) = b exp(- bx)= 0x :::: Ox < O
- 139. 1 28 CHAPTER 3 · S EVERAL RANDOM VARIABLESSuch a density function might arise, for example, as a signal from a space probe in which thetime intervals between counts of high-energy particles are converted to voltage amplitudes forpurposes of transmission back to earth. The noise that is added to this signal is assumed to beGaussian, with zero mean, so that its density function is1 ( n2 )fN(n) r-c exp - -2v 21U1N 2aNThe marginal density function ofY, which appears inthedenominatorof(3-17), now becomes100 b [ (y - x)2 ]fy(y) = r-c exp -2exp(-bx) dxo v 27raN 2aN( b2a2 ) ( y - ba2 )= b exp -by + T Q -aNN(3-18)It should be noted, however, that ifone is interested in locating only the maximum of f(xly),it is not necessary to evaluate fy(y) since it is not a function ofx. Hence, for a given Y, fy(y)is simply a constant.The desired conditional density function can now be written, from (3-17), asb [ (y - x)2 ] .f(xly) = r-c exp -2exp(-bx)v21{aNfr (y) 2aN= 0This may also be written asx ::: 0x < Of(xly) = ./iiibexp {-� [x2 - 2(y - ba�)x + y2J } x .:: 021{aNfy(y) 2aN= 0 x < Oand this is sketched in Figure 3-3 for two different values ofy.(3-19)It was noted earlier that when a particular value of Y is observed, a reasonable estimate forthe true value of X is that value of x which maximizes f(xly). Since the conditional densityfunction is a maximum (with respect to x) when the exponent is a minimum, it follows that thisvalue ofx can be determined by equating the derivative of the exponent to zero. Thus2x - 2(y - ba2) = 0orx = y - ba� (3-20)is the location of the maximum, provided that y - ba� > 0. Otherwise, there is no point ofzero slope on f(xly) and the largest value occurs at x = 0. Suppose, therefore, that the value
- 140. 3-2 CONDITIONAL PROBABILITY- REVISITED 129f(xlyl f(x lyl(a) (b)ftgure 3-3 The conditional density function, f (x ly)-(a) case for y < bai and (b) case for y > bai.Y = Y1 is observed. Then, if YI > bai, the appropriate estimate for X is X = y1 - bai. On theother hand, if Yt < bai, the appropriate estimate for X is X = 0. Note that as the noise getssmaller (ai � 0), the estimate ofX approaches the observed value y1 •Exercise 3-2.1Two random variables, X and Y, have a joint probability density function ofthe form·Findf(x, y) = k(x + 2y)= 00 s x s 1 , 0 s y s 1elsewherea) the value of k for which this is a valid joint probability density functionb) the conditional probability that X is greater than 1 /2 given that Y = 1 /2c) the conditional probability that Y is less than, or equal to, 1 /2 given thatX is 1/2.Answers: 1 /3, 2/3, 7/1 2Exercise 3-2.2A random signal X is uniformly distributed between 1 0 and 20 V. It isobserved in the presence of Gaussian noise N having zero mean and astandard deviation of 5 V.
- 141. 1 30 CHAPTER 3 • SEVERAL RANDOM VARIABLESa) If the observed value of signal plus noise, (X + N), is 5, find the bestestimate of the signal amplitude.b) Repeat (a) if the observed value of signal plus noise is 1 2.c) Repeat (a) if the observed value of signal plus noise is 25.Answers: 20, 1 0, 1 23-3 Statistical IndependenceThe conceptofstatistical independence was introducedearlierin connection withdiscreteevents,but is equally important in the continuous case. Random variables that arise from differentphysical sources are almost always statistically independent. For example, the random thermalvoltage generated by one resistor in acircuit is in no way related to the thermal voltage generatedby another resistor. Statistical independence may also exist when the random variables comefrom the same source but are defined at greatly different times. For example, the thermal voltagegenerated in a resistortomorrow almost certainly does notdepend upon the voltage t�day. Whentwo random variables are statistically independent, a knowledge of one random variable givesno information about the value of the other.The joint probability density function for statistically independent random variables canalways be factored into the two marginal density functions. Thus, the relationshipf(x, y) = fx(x)JY(y) (3-21). -canbeusedas-adefinition forstatistical independence, since itcanbe shownthatthisfactorizationis both a necessary and sufficient condition. As an example, this condition is satisfied by thejointdensity function given in (3-3). Hence, these tworandomvariablesare statistically independent.One ofthe consequences ofstatistical independence concerns the correlation definedby (3-5).Because the joint density function is factorable, (3-5) can be written asE[XY]. = 1_:xfx(x) dx 1_:yfy(y) dy= E[X] E[Y] = X f(3-22)Hence, the expected value of the product of two statistically independent random variables issimply the product of their mean values. The result will be zero, of course, if either randomvariable has zero mean.Another consequence of statistical independence is that conditional probability density functions become marginal density functions. For example, from (3-12)f(x ly) = f(x, y)fy(y)
- 142. 3-3 STATISTICAL IND EPEND ENCE 131but if X and Y are statistically independent the joint density function is factorable and thisbecomesSimilarly,f(xly) = fx(x)fy(y) = fx (x). . py(y)f(ylx) = f(x, y)= fx(x)fy(y)= fy (y)fx(x) fx (x)Itmay be noted that the random variables described by thejoint probability density function ofExercise 3-1.2 are statistically independent since thejoint density function can be factored intotheproductofa function ofx only andafunction ofy only. However, therandom variables definedby thejoint probability density function ofExercise 3-2. l are not statistically independent sincethis density function cannot be factored in this manner.Exercise 3-3.1Two random variables, ·X and Y, have a joint probability density function ofthe formFindf(x, y) = ke-<x+y-1)= 00 :::: x :::: 00 , 1 :::: y :::: 00elsewherea) the values of k and a for which the random variables X and Y arestatistically independentb) the expected vaiue of XY.Answers: 1 , 2, 1Exercise 3-3.2.Two independent random variables, X and Y, have the following probabilitydensity functions.f(x) = o.se-lx-11f(y) = o.se-ly-11- OO < X < OO- oo < y < oo
- 143. 1 32 CHAPTER 3 · S EVERAL RANDOM VARIABLESFind the probability that XY> 0.Answer: 0.66603-4 Correlation between Random VariablesAs noted above, one of the important applications ofjoint probability density functions is thatof specifying the correlationof two random variables; that is, whether one random variabledepends in any way upon another random variable.If two random variables Xand Yhave possible values xand y,then the expected value oftheir product is known as the correlation, defined in (3-5) asE[XY]=1:1:xyf(x,y)dxdy=XY (3-5)Ifboth of these random variables have nonzero means, then it is frequently more convenient tofind the-correlation with the mean values subtracted out. Thus,E[(X-X)(f-f)]=(X-X)(f-Y) (3-23)=l:l:(x-X)(y-f)j(x,y)dxdyThis is known as the covariance,by analogy to the variance of a single random variable.If it is desired to express the degree to which two random variables are correlated withoutregard to the magnitude ofeither one, then the correlationcoefficientor normalizedcovarianceis the appropriate quantity. The correlation coefficient, which is denoted by p,is defined asl[X-X] [f-f]) /°"/°"x-X y-Yp=E -- -- = --·--f(x,y)dxdycrx cry _00 _00 crx cry(3-24)Note that each random variable has its mean subtracted out and is divided by its standarddeviation. The resulting random variable is often called the standardizedvariableand is onewith zero mean and unit variance.An alternative, and-sometimes simpler, expression for the correlation coefficient can beobtained by multiplying out the terms in equation (3-24). This yields100/°"xy-Xy-Yx+XY -p= · f(x,y)dxdy-oo -oo crxcryCarrying out the integration leads to
- 144. 3 - 4 CORRELATION BETWEEN RAN DOM VARIABLESE(XY) - X fp = -----To investigate some of the properties of p, define the standardized variables � and 17 asThen,Now look atX - X� = --axa{ = lY - Y1] = -ayP = E[�17]E[(� ::i: 17)2] = E[�2 ± 2�17+ rJ2] = 1 ± 2p + 1= 2(1 ± p)Since (� ± 1/)2 always positive, its expected value must also be positive, so that2(1 ± p) � 0Hence, p can never have a magnitude greater than one and thusIfX and Y are statistically independent, thenp = E[� rJ] = °fTi = 0133(3-25)since both � and 1J are zero mean. Thtis, the correlation coefficient for statistically independentrandom variables is always zero. The converse is not necessarily true, however. A correlationcoefficient ofzero does not automatically mean thatX and Y are statistically independent unlessthey are Gaussian, as will be seen. .To·illustrat.e the above properties, consider two random·variables for which the joint probability density function isf(x, y) = x + y 0 � x � 1 ,= 0 elsewhereFrom Property 4 pertaining to joint probability density functions, it is straightforward to obtainthe marginal density functions as
- 145. 134andCHAPTER 3 • SEVERAL RANDOM VA,RIABLES111fx(x) = 0(x + y) dy = x + 2111fy(y) = (x + y) dx = y + -0 2from which the mean values ofX and Y can be obtained immediately asX = [1x (x + �) dx = !__lo 2 12with an identical value for E[Y]. The variance ofX is readily obtained fromcr2 = f1 (x -2-)2(x + �) dx = _!_.!__x lo 12 2 144Again there is an identical value for cri Also the expected value ofXY is given by11 11 1E[XY] = xy(x + y) dx dy = -0 03Hence, from (3-25) the correlation coefficient becomesE[XY] - X fp =----- 1/3 - (7/12)2 1=11/144 1 1Although the correlation coefficient can be defined for any pair of random variables, it isparticularly useful for random variables that are individually and jointly Gaussian. In thes€<cases, the joint probability density function can be written as1f(x, y) = -----2rrcrxcry�I-1 [(x - X)2 (y -f)2 2(x - X)(y -f)p]I� + - .2(1 - p2) . cri cri crxcrrNote that when p = 0, this reduces-to!( ) 1I l [(x - X)2 (y -f)2])x, y = exp -- + ---2rrcrxcry 2 cri cri= fx(x)fy(y)(3-26)
- 146. 3-4 . CORRELATION BETWEEN RANDOM VA RIAB LES 1 35which is the form for statistically independent Gaussian randoll). variables. Hence, p = 0 doesimply statistical independence in the Gaussian case.It is also ofinterestto usethecorrelation coefficient to express someresults forgeneral randomvariables. For example, from the definitions of the standardized variables it follows thatand, henceX = ax� + X and Y = ay rJ + YXY = E[(ax� + X) (ay rJ + Y)] = E(axar� T/ + Xay rJ + Yax� + X f) ·= paxay + X YAs a further example, considerE[(X ± Y)2] -= E[X2 ± 2XY + Y2] = X2 ± 2XY + Y2= ai + (X)2 ± 2paxar ± 2X Y + ai + (f)22 2 . - - 2= ax + ay ± 2paxay + (X ± Y)(3-27)Since the last term is just the square of the mean of (X ± Y), it follows that the variance of(X ± Y) is22[acx±nJ = a� + ay ± 2paxar (3-28)Note that when random variables areuncorrelated (p = 0), the variance of sum or difference isthe sum of the variances.Exercise 3-4.1Two random variables have means of 1 and variances of 1 and 4, respectively. Their correlation coefficient is 0.5.a} Find the variance of their sum.b} Find the mean square value of their sum.c} Finrl the mean square value of their difference.Ansl,.Vers: 1 9, 1 7, 1 0Exercise 3-4.2X is a zero mean random variable having a variance of 9 and Y is another
- 147. 136 CHAPTER 3 · SEVERAL RANDOM VARI ..BLESzero mean random variable. The sum of X and Y has a variance of 29 andthe difference-has a variance of 21 .a) Find the variance of Y.b) Find the correlation coefficient of X and Y.c) Find the variance of U = 3X - 5 Y.Answers: 1/6, 421 , 1 63-5 Density Function of the Sum of Two Random VariablesThe above example illustrates thatthe mean and variance associated with the sum (ordifference)of two random variables can be determined from a knowledge of the individual means andvariances and the ·correlation coefficient without any regard to the probability density functionsof the random variables. A more difficult question, however, pertains to the probability densityfunction of the sum of two random variables. The only situation of this sort that is consideredhere is the one in which the two random variables are statistically independent. The more generalcase is beyond the scope of the present discussion.Let X and Y be statistically independent random variables with density functions of fx (x)and fy(y), and let the sum beZ = X + YIt is desired to obtain the probability density function ofZ, fz (z) . The situation is best illustratedgraphically as shown in Figure 3-4. The probability distribution function for Z is justFz (z) = Pr (Z � z) = Pr (X + Y � z)and can be obtained by integrating thejoint density function, f(x , y), over the region below theline, x + y = z . For every fixed y, x must be such that - oo < x < z - y. Thus,Figure 3-4 Showing the region forx + y = z =::: z .
- 148. 3-5 DENSITY FUNCTION OF TH E SUM OF TWO RANDOM VARIABLES 1 37loo lz-yFz (z) = _00-oof(x, y) dx dy (3-29)For the special case in which X and Y are statistically independent, thejoint density function isfactorable and (3-29) can be writ!en asloo 1z-yFz (z) = _00_00fx (x)fy (y) dx dy= 1_:fy (y) 1_:Yfx (x) dx dyThe probability density function of Z is obtained by differentiating Fz (z) with respect to z.HencedFz (z)100fz (z) = -d- = fy (y)fx (z - y) dyx -oo(3-30)since z appears only in the upper limit of the second integral. Thus, the probability densityfunction ofZ is simply the convolution of the density functions ofX and Y.It should also be clear that (3-29) could have been written equaliy well asFz (z) = 1-:L:xf(x, y) dy dxand the same procedure would lead tofz (z) = 1_:fx (x)fy (z - x) dx (3-3 1 )Hence, just asin the case ofsystem analysis, there ar� two equivalent forms forthe convolutionintegral.As a simple example of this procedure, consider the two density functions shown in Figure3-5. These may be expressed analytically asfx(x) = 1 O � x � l= 0 elsewhereandfy (y) = e-y y 2: 0= 0 y < OThe convolution must be carried out in two parts, depending on whether z is greater or less than
- 149. 1 38 CHAPTER 3 • SEVERAL RANDOM VARIABLES0 1(a) (b)Figure 3-5 Density functions for two random variables.one. The appropriate diagrams, based on (3-30),. are sketched in Figure 3-6. When 0 < z :::: l,the convolution integral becomesWhen z > 1, the integral isfz (z) = fo1(l)e-<z-x> dx = (e - l)e-z l < z < ooWhen z < 0, Fz (z) = 0 since both fx(x) = 0, x < 0 and fy (y) = 0, y < 0. The resultingdensity function is sketched in Figure 3-6(c).It is straightforward to ex�end the above result to the difference of two random variables. Inthis case letZ = X - YAll that is necessary in this case is to replace y by -y in equation (3-30). Thus,f,(Z)fx(x) fy(Z - x)Py (Z - x)0.632x0 z 1 0 1x zz 0 1(a) (b) (c)figure 3-6 Convolution of density functions: (a) 0 < z ::: 1 , (b) 1 < z < oo, and (c) /z (z).
- 150. 3-5 DENSITY FUNCTION OF TH E SUM OF TWO RANDOM VARIABLES 1 39fz(z)= 1_:fy(y)fx(z+ y) dy (3-32)There is also an alternative expression analogous to equation (3-3 1). This isfz(z)= 1_:fx(x)fy(x-z)dx (3-,33)It is also of interest to consider the case of the sum of two independent Gaussian randomvariables. Thus letand1 [-<x-X)2 ]fx(x)= r;c exp2v 2rrax 2ax1 - (y -Y)[ - 2 ]fy(y)=./fiiay exp2aiThen if Z = X + Y, the density function for z is [based on (3-3 1)]1 100 [-(x-X)2] [-(z-x-¥)2]fz(z)= 2 exp2exp2dxrraxay -oo 2ax 2ayIt is left as an exercise for the student to verify that the result of this integration is1 1-[z-<x+r)J2)fz(z)= exp2 2J2rr(ai+ ai) 2(ax+ ay) (3-34)This result clearly indicates that the sum of two independent Gaussian random variablesis still Gaussian with a mean that is the sum of the means and a variance that is the sum ofthe variances. It should also be apparent that by adding more random variables, the sum isstill Gaussian. Thus, the sum of any number of independent Gaussian random variables is stillGaussian. Density functions that exhibit this property are said to be reproducible; the Gaussiancase is one of a very limited class of density functions that are reproducible. Although it will notbe proven here, it can likewise be shown that the sum of correlated Gaussian random variablesis also Gaussian with a mean that is the sum of the means and a variance that can be obtainedfrom (3-28).The fact that sums (and differences) of Gaussian random variables are still Gaussian is veryimportant in the analysis of linear systems. It can also be shown that derivatives and integralsof time functions that have a Gaussian distribution are still Gaussian. Thus, one can carry out
- 151. 140 CHAPTER 3 · SEVERAL RANDOM VARIABLESthe analysis of linear systems for Gaussian inputs with the assurance that signals everywhere inthe system are Gaussian. This is analagous to the use of sinuosidal functions for carrying outsteady-state system analysis in which signals everywhere in the system are still sinusoids at thesame frequency.From the nature of convolution it is evident that the probability density function of the sumoftwo random variables will be smoother than the individual probability densities. When morethan two random variables are summed, it would be expected that the resulting probabilitydensity function would be even smoother. In fact, the repetitive convolution of a probability density function (or virtually ariy function) converges toward one of the smoothestfunctions there is, viz., the shape of the Gaussian probability density function. This resultwas discussed in Section 2.5 in connection with the central limit theorem. From the resultsgiven there it can be concluded that summing N independent random variables leads to anew random variable having a mean and variance equal to N times the mean and varianceof the original random variables and having a probability density function that approachesGaussian. This property can be easily demonstrated numerically since the summing of randomvariables corresponds to convolving their probability density density functions. As an exampleconsider a set of random variables having an exponential probability density function oftheformThe convolution of the probability density functions can be carried out with the followingMATLAB program.·% gausconv.m program to demonstrate central limit theoremX=0:.1 :5;f=exp(-x); g=f;elfaxis([0,20,0, 1 ])holdplot(x,f)for k=1 : 1 0g=..1 *conv(f,g);y=.1.*(0:length(g)-1 );"plot(y,g)·endxlabel(y); ylabel(g(y))The resulting plot is shown in Figure 3-7 and is a sequence of probability density functions(PDF) that is clearly converging toward a Gaussian shape.
- 152. �-C>3-5 DENSITY FUNCTION OF TH E SUM OF TWO RANDOM VARIABLES 14110.90.80.70.60.50.40.30.20.15 10 1 5 20yFigure 3-7 Convergence of PDF of sums of random variables toward Gaussian.Exercise 3-5.1Let X and Y be two statistically independent random variables havingprobability density functions:fx (x) = 1= 0fy (y) = 1= 0O < x < IelsewhereO < y < IelsewhereFor the random variable Z = X + Y finda) the value for which fi.,z) is a maximumb) the probability that z is less than 0.5.Answers: 0. 1 25, 1 .0
- 153. 142 CHAPTER 3 · SEVERAL RANDOM VARIABLESExercise 3-5.2The resistance values in a supply of resistors are independent randomvariables that have a Gaussian probability density function with a mean of1 00 Q and standard deviation of 5 Q. Two resistors are selected at randomand connected in series.a) Find the most probable value ef resistance of the series combination.b) Find the probability that the series resistance will exceed 21 0 Q.Answers: 200, 0.07863-6 Probability Density Function of a Function of Two RandomVariablesA more general problem than that considered in the previous sections is that of finding theprobability density function of random variables that are functions of other random variables.Let X and Y be two random variables with joint probability density function f(x, y) and let twonew random variables be defined as Z = cp1 (X, Y) and W = <pz (X, Y) with the inverse relationsX = 1/11 (Z, W) and Y = 1/12(Z, W) . Let g(z, w) be the joint probability density function ofZ and W and consider the case where as X and Y increase both Z and W also increase. Theprobability that xand y lie in a particular region is equal to the probability that z and w lie ina corresponding region. This can be stated mathematically as follows.Pr (z1 < Z < z2. w, < W < w2) = Pr (x1 < X < x2, Y1 < Y < Y2) (3-35)or equivalently1Q1� 1�1ng(z, w) dz dw = f(x, y) dx dyZJ W J . XJ YI(3-36)Now making use of the transformation of coordinates theorem from advanced calculus, thisexpression can be written in terms of the relations between the variables as1Q1� 1Q1�g(z, w) dz dw = /[1/11 (z, w), 1/12(z, w)]J dz dwZ J W J Z J WJ(3-37)where J is the Jacobian of the transformation between X, Y and Z, W. J is a determinant formedfrom the partial derivatives of the variables with respect to each other as follows.
- 154. 3-6 PROBABILITY DENSITY FUNCTION OF A FUNCTION 143ax ax-l =az away ay(3-38)az awIn a similar manner it can be shown that when the transformed variables move in a directionopposite to the original variables, the Jacobian is negative and a minus sign appears in thetransformation. The net result is that the same equation is valid for both cases, provided that theabsolute value of the Jacobian is used. The final equation is then1°1� 1°1�g(z, w) dz dw = f[1/!1 (z, w), T/12(z, w)] Ill dz dwZ I WJ Z I WIand from this it follows thatg(z, w) = Ill f[1/!1 (z, w), 1/!2(z, w)](3-39)(3--40)As an illustration consider the case of a random variable that is the product of two randomvariables. Let Z = XY where X and Y are random variables having a joint probability densityfunction, f(x, y). Further, assume that W = X so that (3-39) can be used. From these relationsit follows that·Thenz = xyw = xx = wy = z/wax axl =azayazawayaw= I� �zI=-1- - ww w2. and from (3-40) it follows thatg(z, w) = l�l f (w, �)(3--41)(3--42)The marginal probability density function of Z is then found by integrating over the variable wand is given by, looloo lg(z) � _00g(z, �) dw = _00j;"jf (w, �) dw (3--43)
- 155. 144 C H A PT E R 3 • SEVERAL RANDOM VARIABLESIt is not always possible to carry out analytically the integration required to find the transformedprobability density function. In such cases numerical integration or simulation can be used toobtain numerical answers.One application of (3-43) is in characterizing variations in the area of a surface whosedimensions. are random variables. As an example, assume that a solar cell has dimensionsof I0 cm by 10 cm and that the dimensions are uniformly distributed in an interval of ±0.5mm about their mean values. It is desiredto find the probability that the area ofthe solarcell iswithin ±0.5% of the nominal value of 100 cm2• Assuming that the dimensions are statisticallyindependent, thejoint PDF is just theproduct ofthemarginal density functions and is given by1 1f(x, y) = OJ · OJ = 100 9.95 < X , y < 10.05Now define a new random variable, Z = XY, which is the area. From (3-43) the PDF of Z isgiven byf·oo 1( z)g(z) =-1 1f w, - dw-oo w w= 100 - rect --- rect dw!00 I (w - 10 ) [(z/w) - 10 ]-oo l w l -0. 1 0. 1(3-44)To evaluate this integral it is necessary to determine the regions over which the rectanglefunctions are nonzero. Recall that rect(t) is unity for l t l < 0.5 and zero elsewhere. Fromthis it follows that the first rect function is nonzero for 9.95 < w < 10.05. For the second rectfunction the interval over which it is nonzero is dependent on z and is given byz z-- < W < --10.05 9.95The range of z is 9.952 to 10.052. A sketch will show that for 9.952 :::: z :::: 9.05 x 10.05 thelimits on the integral are (9.95, z/9.95) and for 9.95 x 10.05 :::: z :::: 10.052 the limits on theintegral are (z/ 10.05, 10.05). From this it follows that1ris 1( z )g (z) = - dw = In --29.95 w 9.95f10.05 1 ( z)= - dw = - ln --_, w 10.05210.059.952 :::: z :::: 9.05 x 10.059.952 x 10.05 :::: z :::: 10.052 (3-45)That this is a valid PDF can be checked by carrying out the integration to show that the areais unity. The probability that Z has a value less than any particular value is found from thedistribution function, which is found by integrating (3-45) as follows.For 9.952 ::::: z :::: 9.05 x 10.05F(z) = 100 r In (�) dv = 100 {z In ( 9z2) - z +9.952 } (3-46)19.952 9.95 9. 5
- 156. �-6 PROBABILITY DENSITY FU NCTION OF A FUNCTION 145For 9.95 x 10.05 :::::: z :::::: 10.052F(z) = F(9.95 x 10.05)- 100 {z In (�02 ) - z - 9.95 x 10.05 In ( 9·95) +9.95 x 10.05 } (3--47)10. 5 10.05To determine the probability that the area is within ±0.5% of the nominal value it is necessaryto determine the values of F(z) for which z is equal to 99.5 cm2 and 100.5 cm2. This can be doneby evaluating (3-46) and (3-47) for these values of z. Another approach, which will be usedhere, is to produce a table of values of F(z) vs. z and then to interpolate the desired value fromthe table. The following MATLAB program calculates and plots f(z) and F(z) and also carriesout the interpolation to find the desired probability using the table lookup function tablet.%areapdf.m%program to compute dist function of the product of two rvz1 =linspace(9.95A2, 9.95*1 0.05, 1 1 ); z2=1inspace(1 0.05*9.95,1 0.05A2, 1 1 );f=1 OO*[log(z1 /9.95A2), -log(z2(2: 1 1 )/1 0.05A2)];F1 =1 OO*(z1 .*log(z1/9.95A2)- z1 +9.95A2*ones(size(z1 )));F2=-1 OO*(z2.*log(z2/1 0.05A2)-z2-(9.95*1 0.05*1og(9.95/1 0.05)-9.95*1 0.05)*ones(size(z2)))+F1 ( 1 1 )*ones(size(z2));F=[F1 ,F2(2:1 1 )];Z=[Z1 ,z2(2:1 1 )];subplot(2,1 , 1 ); plot(z,f)grid;xlabel(z);ylabel(f(z))subplot(2,1 ,2); plot(z,F)grid;xlabel(z);ylabel(F(z))%find probability area is within =0.5% of nominalT=[z,F];Pr=table1 (T, 1 00.5) - table1 (T,99.5)The probability density function and probability distribution function are shown in Figure 3-8.The probability that the area is within 0.5% of the nominal value is 0.75.Another way of determining the probability density function of a function such as Z inthe previous example is by simulation. This can be done by generating samples of the random
- 157. 1 46 C H A PT E R 3 · SEV E RAL RANDOM VARIABLES1 .5f(z)O IL...���___J_����...L����.1..-���-=����-9a 99.5 1 00 1 00.s 1 01 1 01 .5z1 .5F(z)0 ---=:::::::;:_�...L����--����--1�����.L-�����99 99.5 1 00 1 00.5 1 01 1 01 .5zFigure 3-8 f(<.) and F(z).variables. and using them to compute the function ofthe random variables and thento determiriethe statistics of the samples of the resulting function. In the present instance this can be donequite readily. The accompanying MATLAB program, which can be attached to the end of thepreceding program, generates 2000 samples of the variables X and Y with the specified PDFs.The function Z = X Y is then calculated and its statistical behavior determined by computinga histogram. By dividing the ordinates of the histogram by the total number of points present,an approximation to the PDF is obtained. The resulting approximation to the PDF is shown inFigure 3-9 along with the theoretical value. It is seen that the agreement is very good. When thetails of the PDF fall off gradually it may be necessary to use a large number of points to obtainthe desired accuracy.
- 158. 3-6 PRO B A B I L ITY D E NS ITY FU NCTI O N OF A FUNCTION 1471 .21,0.8�Q.0.6-:-;:.�....0.40.2099 99.5 1 00 1 00.5 1 01zRgure 3-9 Simulated and analytical probability density functions of Z = XY.n=1 :2000;X=0.1 *rand(size(n))+9.95*ones(size(n));Y=0.1 *rand(size(n))+9.95*ones(size(n));Z=X.*Y;h=hist(Z,21);p=h/(length(n)*(z(2)-z(1 )));clgplot(z,p,-,z,f,-).grid;xlabel(z); ylabel(f(z),p(z))Exercise 3-6.11 01 .5Two random variables X and Y have a joint probability density function ofthe form
- 159. 1 48 CHAPTER 3 · SEVERAL RANDOM VARIABLESf(x, y) = 1= 0 elsewhereFind the probability density function of Z = XY.Answer: -In (z)Exercise 3-6.2Show that the random variables X and Y in Exercise 3-6. 1 are independentand find the expected value of their product. Find E{Z} by integrating thefunction zf(z).Answer: 1/43-7 The Characteristic FunctionIt is shown in Section 3-5 that the probability density function of the sum of two independentrandom variables can be ol;itained by convolving the individual density functions. When morethan two random variables are summed, the resulting density function can obviously be obtainedby repeating the convolution until every random variable has been taken into account. Since thisis a lengthy and tedious procedure, it is natural to inquire if there is some easier way.When convolution arises in system and circuit analysis, it is well known that transformmethods can be used to simplify the computation since convolution then becomes a simplemultiplication of the transforms. Repeated convolution is accomplished by multiplying moretransforms together. Thus, it seems reasonable to try to use transform methods when dealingwith density functions. This section discusses how to do it.The characteristicfunction of a random variable X is defined to be(3-48)and this expected value can be obtained from</J (u) = 1_:f(x)ejux dx (3-49)The right side of (3-49) is (except for a minus sign in the exponent) the Fourier transform ofthe density function f (x) . T�e difference in sign for the characteristic function is traditionalrather than fundamental, and makes no essential difference in the application or properties ofthetransform. By analogy to the inverse Fourier transform, the density function can be obtainedfrom
- 160. 3 - 7 T H E C H A RACT E RI S T I C F U N CT I O N1100 . .f(x) = - <f> (u)e-1" du2n _001 49(3-50)To illustrate one application of characteristic functions, consider once again the problem offinding the probability density function ofthe sum of two independent random variables X andY, where Z = X + Y. The characteristic functions for these random variables are</>x (u) = 1_:fx (x)ejiix dxand<f>y (u) = 1_:fy (y)ejux dySince convolution corresponds to multiplication of transforms (characteristic functions), itfollows thatthe characteristic function of Z is</>z (u) = </>x (u)<f>y (u)The resulting density function forZ becomes1100 .fz (z) = - </>x (u)</>y (u)e-1"z du2n _00(3-5 1 )This technique can be illustrated by reworking the example of the previous section, in whichX was uniformly distributed and Yexponentially distributed. Sincethe characteristic function isLikewise,fx (x) = 1= 0 elsewhere11e1.·uxI</>x (u) = 0 (l)ejux dx =J U0=jufy (y) = e-Y= 0y ::: Oy < O
- 161. 1 50so thatCHAPTER 3 · S EVERAL RANDOM VARIABLES100 . e<-l+ju)y loo 1</Jy (u) = e-Yel"Y dy = . .= --.o (- l + J u) 0 l � juHence, the characteristic function of Z iseju - 1</Jz (u) = </Jx (u)</Jy (u) = . .J U (l - J U)and the corresponding density function is1 loo eju - 1 .f (z) - - e-1uz duz - 2 . (1 . )T( _00 J U - J U1 100 eju(l-z) 1 100 e-juz= -. � - - �27( _00j u (I - ju) 27( _00 ju(I - ju)= (e - l)e-zwhen 0 < z < 1when 1 < z < ooThe integration can be carried out by standard inverse Fourier transform methods or by theuse of tables.Another application ofthecharacteristic function is to findthemoments ofarandom variable.Note that if </J(u) is differentiated, the result isd<jJ (u) loo .-- = f(x) (jx)e1"x dxdu _00For u = 0, the derivative becomesd<jJ (u)I100 --- = j xf(x) dx = jXduu=O -oo(3-52).Higher order derivatives introduce higher powers ofx into the integrand so that the general nthmoment can be expressed as(3-53)Ifthe characteristic function is available, this may be much easier than carrying out the requiredintegrations of the direct approach.There are some fairly obvious extensions of the above results. For example, (3-51) canbe extended to an arbitrary number of independent random variables. If X1 , X2, . . . , Xn areindependent and have characteristic functions of ¢1 (u), </Ji(u), . . . , <Pn(u); and if
- 162. ·3_7 THE CHARACTERISTIC FU NCTIONY = XJ + X2 + . . . + Xnthen Y has a characteristic function ofr/Jy (u) = ¢1 (u)¢2(u) · · · ef>n (u)and a density function of1100 .fy(y) = - ¢1 (u)¢z(u) · · · ef>n (u)e-Juy du2Jr -001 5 1(3-54)The characteristic function can also be extended to cases in which random variables are notindependent. For example, ifX and Y have a joint density function of f (x, y), then they have ajoint characteristic function ofr/>x,Y(U, v) = E[ej(uX+vY)] = L:L:f(x, y)ej(ux+vy)dx dy (3-55)The corresponding inversion relation isf(x, y) = �100100ef>xy (u, v)e-j(ux+vy) du dv(2Jr) -00 -00 (3-56)The joint characteristic function can be used to find the correlation between the randomvariables. Thus, for exarr.ple,More generally,E[XY] = XY = -[a2¢xr (u, v)Jau avu=v=O(3-57)(3-58)The results given in (3-53), (3-56), and (3-58) are particularly useful in the case of Gaussianrandom variables since the necessary integrations and differentiations can always be carried out.One of the valuable properties of Gaussian random variables is that moments and correlationsof all orders can be obtained from a knowledge of only the first two moments and the correlationcoefficient.Exercise 3-7.1Forthe two random variables in Exercise 3-5.1 , find the probability densityfunction of Z = X + Y by using the characteristic function.
- 163. 1 52 CHAPTER 3 • SEVERA L RANDOM VARIABLESAnswer: Same as found in Exercise 3-5.1 .t:xercise 3-7.2A random variable X has a probability density function of the formf(x) = 2e-2xu(x)Using the characteristic function, find the first and second moments of thisrandom variable.Answers: 1/2, 1 /2PROBLEMS3-1 .t Two random variables have a joint probability distribution function defined byF(x, y) = 0= xy= 1a) Sketch this distribution function.x < 0, y < 00 :::: x :::: 1 , 0 :::: y :::: 1x > Iy > 1b) Find the joint probability density function and sketch it.c) Find the joint probability of the event X :::; � and Y > *.3-1 .2 Two rando� variables, X and Y, have a joint probability density function given byf(x, y) = kxy= 00 :::: x :::: 1, 0 :::: y :::: 1elsewherea) Determine the value of k that makes this a valid probability density function.b) Determine the joint probability distribution function F(x, y).c) Find the joint probability of the event X :::; ! and Y > !.
- 164. PROBLEMS 1 53d) Find the marginal density function, fx(x)3-1 .3 a) For the random variables of Problem 3-1 . 1 find E [XY].b) For the random variables ofProblem 3-1 .2 find E [XY].3-1 .4 Let X be the outcome from rolling one die and Y the outcome from rolling a seconddie.a) Find thejoint probability ofthe event X .:::: 3 and Y > 3.b) Find E[XY].c) Find E[ f.].3-2. 1 A signal X has a Rayleigh density function and a mean value of 10 and is added tonoise, N, that is uniformly distributed with a mean value of zero and a variance of 12.X and N are statistically independent and can be observed only as Y = X + N.a) Find, sketch, and label the conditional probability density function, f(xjy), as afunction ofxfor y = 0, 6, and 12.b) Ifan observation yields a value ofy = 12, what is the best estimate ofthe true valueof X?3-2.2 For thejoint probability density function of Problem 3-1 .2, finda) the conditional probability density function f(xjy)b) the conditional probability density function f(yjx).3-2.3 Adesignal having auniform distribution overtherange from -5Vto+5V is measuredin the presence of an independent noise voltage having a Gaussian distribution withzero mean and a variance of 2v2.a) Find, sketch, and label the conditional probability density function of the signalgiven the value ofthe measurement.b) Find the best estimate of the signal voltage if the measurement is 6 V.c) Find the best estimate of the noise voltage if the measurement is 7 V.3-2.4 A random signal X can be observed only in the presence of independent additive noiseN. The observed quantity is Y = X + N. The joint probability density function of Xand Y is
- 165. 1 54 CHAPTER 3 · SEVERAL RANDOM VARIABLESf(x, y) = K exp[- (x2 +y2 +4xy)] allx and ya) Find a general expression for the best estimate ofX as function of the observationy = y.b) If the observed value of Y is y = 3, find the best estimate ofX.3-3. 1 For each of the followingjoint probability density functions state whether the randomvariables are statistically independent and find E[XY].-kxa) f(x, y) = -.y= 0b) f(x, y) = k(x2 +y2)= 0lc) f(x, y) = k(xy +2x + 3y +6)= 0�lsewhere0 s x s 1, 0 s y s 1elsewhere0 s x s 1, 0 s y s 1elsewhere3-3.·2 Let X and Y be statistically independent random variables. Let W = g(X) andV = h (Y) be any transformations with continuous derivatives on X and Y. Showthat W and V are also statistically independent random variables.3-3.3 Two independent random variables, X and Y, have Gaussian probability densityfunctions with means of 1 and 2, respectively, and variances of 1 and 4, respectively.Find the probability that XY > 0.3-4. 1 Two random variables have zero mean and variances of 16 and 36. Their correlationcoefficient is 0.5.a) Find the variance oftheir sum.b) Find the variance of their difference.c) Repeat (a) and (b) if the .::orrelation coefficient is -0.5.3-4.2 Two statistically independent random variables, X and Y, have variances of ai = 9and a} = 25. Two new random variables are defined byU = 3X +4fV = 5X - 2Y
- 166. PROBLEMS 1 55a) Find the variances of U and V.b) Find the correfation coefficient of U and V.3-4.3 A random variableXhas avariance of9 and a statistically independentrandomvariableY has a variance of 25. Their sum is another random variable Z = X + Y. Withoutassuming that either random variable has zero mean, finda) the correlation coefficient for X and Yb) the correlation coefficient for Y and Zc) the variance ofZ.· 3-4.4 Three zero mean, unit variance random variables X, Y, and Z are added to form a newrandom variable, W = X + Y + Z. Random variables X and Y are uncorrelated, Xand Z have a correlation coefficient of 1/2, and Y and Z have a correlation coefficientof -1/2.a) Find the variance of W.b) Find the correlation coefficient between W and X.c) Find the correlation coefficient between W and the sum of Y and Z.3-5. 1 A random variable X has a prqbability density function offx(x) = 2x= 0 elsewhereand an independent random variable Y is uniformly distributed between -1.0and 1.0.a) Find the probability density function of the random variable Z = X + 2Y.b) Find the probability that 0 < Z ::: 1.3-5.2 A commuter attempts to catch the 8:00 am train every morning although his arrivaltime at the station is a random variable that is uniformly distributed between 7:55 amand 8:05 am. The trains departure time from the station is also a random variable thatis uniformly disttibuted between 8:00am and 8:10 am.a) Find the probability density function of the time interval between the commutersarrival at station and the trains departure time.b) Find the probability that the commuter will catch the train.
- 167. 1 56 CHAPTER 3 · S EVERAL RANDOM VARIABLESc) Ifthe commuter gets delayed 3 minutes by a trafficjam, find the probability that thetrain will still be at the station.3-5.3 A sinusoidal signal has the formX(t) = cos (IOOt + 8)where e is a random variable that is uniformly distributed between 0and 2n: . Anothersinusoidal signal has the formY(t) = cos (IOOt + II)where W is independent of e and is also uniformly distributed between 0 and 2n: .The sum of these two sinusoids, Z(t) = X(t) + Y(t) can be expressed in terms of itsmagnitude and phase asZ(t) = A cos (100t + ¢)a) Find the probability that A > I .b) Find the probability that A .5 4.3-5.4 Many communication systems connecting computers employ a technique known as"packet transmission." In this type of system, a collection of binary digits (perhaps1000 of them) is grouped together and transmitted as a "packet." The time intervalbetween packets is a random variable that is usually assumed to be exponentiallydistributed with a mean value that is the reciprocal of the average number of packetsper second that is transmitted. Under some conditions it is necessary for a user to delaytransmission of a packet by a random amount that is uniformly distributed between 0and T. If a user is generating 100 packets per second, and his maximum delay time, T,is 1 ms, finda) the probability density function of the time interval between packetsb) the mean value of the time interval between packets.3-5.5 Two statistically independent random variables have probability density functions asfollows:fx (x) = 5e-sxu (x)fy (y) = 2e-2Yu (y)For the random variable Z = X + Y find
- 168. PROBLEMS 157a) /z(O)b) the value for which fz (z) is greater than 1.0c) the probability that Z is greater than 0.1.3-5.6 A box contains resistors whose values are independent and are uniformly distributedbetween 100and 120 Q. Iftwo r�sistors are selected at random and connected in series,. finda) the most probable value of resistance for the series combinationb) the largest value of resistance for the series combinationc) the probability that the series combination will have a resistance value greater that220 Q.3-5.7 Itis often said that an excellent approximation to a random variable having a Gaussiandistribution can be obtained by averaging together 10 random variables having auniform probability density function. Using numerical convolution of the probabilitydensity functions find the probability density function of the sum of 10 randomvariables having a uniform distribution extending over (0, 1).Plot the resulting densityfunction along with the Gaussian probability density function having the same meanand variance. (Hint: Use a small sampling interval such as 0.002for good results.)3-6. t The random variables, Xand Y, have ajoint probability density function given byf(x, y) = 4xy 0 < x < 1 0 < y < 1By transformation of variables find the probability density function of Z = X + Y.3-6.2 For the random variables in Problem 3--6.1 find a graphical approximation !O theprobability density function of Z using simulation and check the result by numerical· convolution. (Hint:Use the technique described in Chapter 2and Appendix G to obtainsamples of the random variables X and Y from their marginal probability distributionfunctions and samples having a uniform distribution.)3-7. t A random variable Xhas a probability density function of the formand an independent random variable Y has a probability density function offy(y) = 3e-3Yu(y)
- 169. 158 C H A PTER 3 • S EV E RA L RA N DOM VA RI A B L ESUsing characteristic functions, find the probability density function of Z = X + Y.3-7.2 a) Find the characteristic function of a Gaussian random variable with zero mean andvariance u2•b) Using the characteristic function, verify the result in Section 2-5 for the nth centralmoment of a Gaussian random variable.3-7.3 The characteristic function of the B�rnoulli distribution is</J (u) = I - p + peiuwhere p is the probability that the event of interest will occur at any one trial. Finda) the mean value of the Bernoulli random variableb) the mean-square value of the random variablec) the third central moment of the random variable.3-7A Two statistically independent random vanables, X and Y, have probability densityfunctions given byfx (x) = se-5xu(x)fy (y) = 2e-.lYu(y)For the random variable Z = X + Y finda) the probability density function of Z using the characteristic functions of X and Y ·b) the first and second .moments of Z using the characteristic function.3-7.5 A random variable X has a probability density function of the formf(x) = 2e-41xlUsing the characteristic function find the first and second moments ofX.ReferencesSee referencesfor Chapter 1, particularly Clarke and Disney, Helstrom, and Papoulis.
- 170. CHAPTER4--------......__-Elements of Statistics4-1 IntroductionNow that we have completed an introductory study of probability and random variables, itis desirable to turn our attention to some of the important engineering applications of theseconcepts. One such application is in the field of statistics. Although our major objective in thistext is to apply probabilistic concepts to the study of signals and systems, the field of statisticsis of such importance to the engineer that it would not be appropriate to proceed without abrief discussion of the subject. Therefore, the objective of this chapter is to present a very briefintroduction to some of the elementary concepts of statistics before turning all of our attentionto signals and systems. It may be noted, however, that this material may be omitted withoutjeopardizing ):he understanding of subsequent chapters if time does not permit its inclusion.Probability and statistics are often consideredto be one and the same subject and they are oftenlinked together in courses and textbooks. However, they are really two different areas ofstudyeven though statistics relies heavily upon probabilistic concepts. In fact, the usual definitionof statistics makes no reference to probability. Instead, it defines statistics as the science ofassembling, classifying, tabulating, and analyzing data or facts. In apparent agreement with thisdefinition, a popular undergraduate textbook on statistics does not even discuss probability untilthe eighth chapter!·There are two general branches ofstatistics that are frequently designated as descriptive statistics and inductive statistics or statistical inference. Descriptive statistics involves collecting,grouping, and presenting data in a way that can be easily understood or assimilated. Statisticalinference, on the other hand, uses the data to draw conclusions about, or estimate parametersof, the environment from which the data came.The field of statistics is very large and includes a great many areas of specialty. For ourpurposes, however, it is convenient to classify them into five theoretical areas:1. Sampling theory, whichdeals with problems associated with selecting samples from somecollection ofdata that is too large to be examined completely.1 59
- 171. 160 CHAPTER 4 · ELEMENTS OF STATISTICS2. Estimation theory, which is concerned with makirig some estimate or prediction based onthe data that are available.3. Hypothesis testing, which attempts to decide which of two or more hypotheses about thedata are true.4. Curve fitting and regression, which attempts to find mathematical expressions that bestrepresent the data.5. Analysis of variance, which attempts to assess the significance of variations in the dataandthe relation of these variations to the physical situations from which the data arose.One cannot hope to coverall ofthese topics in one briefchapter, so we willlimitour attention tosome simple concepts associated with sampling theory, a brief exposure to hypothesis testing,and a short discussion and example oflinear regression.4-2 Sampling Theory-The Sample MeanA problem that often arises in connection with quality control of manufactured. items isdetermining whether the items are meeting the desired quality standards without actually testingall of them. Usually, the number of items being manufactured is so large that it would beimpractical to test every one. The alternative is to test only a few items and hope that these feware representative of all the items. Similar problems arise in connection with taking polls ofpublic opinion, in determining the popularity ofcertain television programs, or in determiningany sort of average about the general population.Problems of the type listed above are solved by sampling the collection of items or factsthat is being considered. A sufficient number of samples must be taken in order to obtain ananswer in which one has reasonable confidence. Clearly, one would not predict the outcome ofa presidential election by taking the result ofasking the first person met on the street. Nor wouldone claim that one million transistors are all good or all bad on the basis of testing only one ofthem. Ontheotherhand, itmaybeveryexpensive andtimeconsuming to take samples; thus, itisimportantnotto take more samples than are actually required. Oneofthe purposes ofthis sec�ionis to determine how many samples are required for a given degree of confidence in the result.It is necessary to introduce some terminology in connection with sampling. The collection ofdata that is being studied is known as thepopulation. For example, if a production line is set upto make a particular device, then all ofthese devices that are produced in a given run become thepopulation. If one is concerned with predicting the outcome of an election, then the populationis all persons voting in that election. The number of items or pieces of data that make up thepopulation is designated as N. This is said to be the size of the population. If N is not a verylarge number, then its value may be significant. On the other hand, ifN is very large it is oftenconvenient to assume that it is infinity. The calculations for infinite populations are somewhateasier to carry out than for finite values ofN, and, as will be seen, for very largeN it makes verylittle difference whether the actual value ofN is used or if one assumes N is infinite.A sample, or more precisely a random sample, is simply part of the population that hasbeen selected at random. As mentioned in Chapter 1, the term "selected at random" impliesthat all members of the population are equally likely to be selected. This is a very importantconsideration and one must often go to considerable difficulty to ensure that all members ofthe. .
- 172. 4-2 SAMPLING THEORY-TH E SAMPLE MEAN 161population do have an equal probability of being selected. The number of items or pieces ofdata in the sample is denoted as n-and is called the size of the sample.There are a number of calculations that can be made with the members ofthe sample and oneof the most important of these is the s_ample mean. For most engineering purposes, every itemin the sample can be assigned a numerical value. Obviously, there are other types of samples,such as might arise in public opinion sampling, where numerical values cannot be assigned; weare not going to be concerned with such situations. For our purposes, let us assume that we havea sample of size n drawn from a population of size N, and that each element of the sample hasa numerical value that is designated by x1 , Xz , . . . , Xn · For example, if we are testing bipolartransistors these x-values might be the de current gain, {3. We also assume that we have a trulyrandom sample so that the elements we have aretrulyrepresentative ofthe entire population. Thesample mean is simply the average of the numerical values that make up the sample. Hopefully,this average value will be close to the average value of the population from which the sample isdrawn. How close it might be is one of the problems addressed here.When one has a particular sample, the sample mean is denoted by1 nx = - L:x;n i = I(4-1 )where the x; are the particular values in the sample. More generally, however, we are interestedin describing the statistical· properties of arbitrary random samples rather than those of anyparticular sample. In this case, the sample mean becomes a random variable, as do the membersof the sample. Thus, it is appropriate to denote the sample mean asA 1 nX = - L:X;ni = I(4-2)where the X; are rahdom variables from the population and each is assumed to have thepopulation probability density function f(x). Note that the notation here is consistent withthat used previously in connection with random variables; capital letters are used for randomvariables and lower case letters for possible values of the random variable. This notation is usedthroughout this chapter and it is important to distinguish general results, which deal with randomvariables, from specific cases, In which particular values are used.Thetruemeanvalueofthepopulationfromwhichthe sample came is denoted by X. Hopefully,the sample mean will be close to this value. Since the sample mean, in the general case, is arandom variable, it also has a mean value Thus,A [1 n ]E[X] = E � �X;1 n= - LE[X;]ni = I1 � - -= - L., X = Xni = I
- 173. 162 CHAPTER4 • ELEMENTS OF STATISTICSIt is clear from this result that the mean value of the sample mean is equal to the true meanvalue ofthe population. It is said, therefore, that the sample mean is an unbiased estimate ofthepopulation mean. The term "unbiased estimate" is one that arises often in the study of statisticsand it simply implies that the mean value of the estimate of any parameter is the same as thetrue mean value of the parameter.Although it is certainly desirable for the sample mean to be an unbiased estimate of the truemean, this is not sufficient to indicate whether the sample mean is a good estimator of thetrue population mean. Since the sample me(!n is itself a random variable, it will have a valuethat fluctuates arou.nd the true population mean as differel).t samples are drawn. Therefore, it isdesirable to know something about the magnitude of this fluctuation, that is, to determine thevariance of the sample mean. This is done first for the case in which the population size is verymuch greater than the sample size, that is, N » n. In such cases, it is reasonable to assume thatthe characteristics of the population do not change as the sample is drawn. It is. also equivalentto assuming that N = oo. ATo calculate the variance, we look Aat the·difference between the mean-square value of Xand the square of the mean value of X, which, as we have just seen, is the true mean of·thepopulation, X. Thus,A [l n n ]Var(X) = _En2�?;X;Xj - (X)2(4-3)Since X; and Xj are parameters of different items in the population, it is reasonable to assumethat they are statistically independent random variables wheri i -::/= j. Hence, it foliows that-2E[X;Xj] = X i = jUsing this result in (4-3) leads to= (X)2 or (X)2 i -::/= jVar (X) = :2[nX2 + (n2 - n)(X)2] - (X)2x2 - cx)2 a2= =n n(4-4)where a2 is the true variance of the population. Note that the variance of-the sample mean canbe made small by making n large. This suggests that large sample sizes leadto a better estimateof the population mean, since the expected value ofthe sample mean is always equal to the true
- 174. 4-2 SAMPLING THEORY-TH E SAMPLE MEAN 163population mean, regardless of sample size, but the variance ofthe sample mean decreases as ngets large.As noted previously, the result given in (4-4) assumed that N was very large. There is analternative approach to sampling that leads to the same result as assuming a large population.Recall that the basic reason for assuming thatthe population size is very large is to ensure thatthe statistical characteristics of the population do not change as we withdraw the members ofthe sample. For example, suppose we have a population consisting of five 10-Q resistors andfive 100-Q resistors. Withdrawing even one resistor will .leave the remaining population with asignificantly different proportion ofthe two resistor types. However, ifthe population consistedof one million 1o�n resistors and one million 100-Q resistors, then withdrawing one resistor,or even a thousand resistors, is hot going to alter the composition of the remaining populationsignificantly. The same sortoffreedomfrom changingpopulation characteristics canbeachievedby replacing an item that is withdrawn after it has been examined, tested, and recorded. Sinceeveryitemis drawn from exactly the samepopulation, theeffect ofhaving an infinitepopulationis achieved. Ofcourse, onemay select an itemthathasalreadybeen examined, butifthe selectionis done in a truly random fashion this will make no difference to the validity ofthe conclusionsthat might be drawn. Sampling done in this manner is said to be sampling with replacement.There may be situations, of course, in which one may not wish to replace a sample or maybe unable to replace it. For example, ifthe testing to be done is a life test, or a test that involvesdestroying the item, replacement is not.possible. Similarly, in a public opinion poll or TVprogr.aln survey, one simplidoes not wish to questionthe same person twice. In such situations,it is still possible to calculate the variance of the sample mean even when the population size isquite small. The mathematical expression for this, which is simply quoted here without proof, is� ;z (N - n )Var(X) = - -n · N .- 1(4-5)Note that as N becomes very large, this expression approaches the previous one. Note also,that-if-N = n, the sample variance becomes zero. This must be the case because this conditioncorresporids to every item in the population being sampled and, hence, the sample mean mustbe .exactly the Same as the population mean. It is clear, however, that one would not do this ifdestructive testing wereinvolved! Two examples serve to illustrate· the above ideas. The firstexampleconsiders a case in which the population size is infinite orvery large. Suppose we havea random waveform such as illustrirted in Figure 4-1 and we wish to estimate the mean valueofthis waveform, which, we shall assume,-has a true mean value of 10 and a true variance of9.As indicated.in Figure 4-1 , the value of this waveform is being sampled at equally spacedtime instants t1 , t2, • • • , tn . In ttie general situation, these sample values are random variablesand are denoted by Xi = X (t;) for i = 1 , 2, . .. , n. We would like to find how many samplesshould be taken to estimate the mean value of this waveform with a standard deviation that isonly one percent of the true mean value. If we assume that tlie waveform lasts forever, so thatthe population oftime S!llilples is infinite, then from (4-4)� crz 9 zVar(X) = - = - = (0.01 x 10) = 0.01· n n · ·
- 175. 1 64 CHAPTER 4 · ELEMENTS O F STATISTICSX(t)figure 4-1 A randorn"waveform that is being sampled.in which the two right-hand terms are the desired variance of the estimate and correspond to astandard deviation of 1% of the true mean. Thus,9n = - = 9000.01This result indicates that the sample size mustbequite large in most cases ofsampling aninfinitepopulation, or in sampling with replacement, ifit is desired to obtain a sample mean with a smallvariance.Ofcourse, estimating themeanvalue oftherandom timefunctionwiththe specified variancedoes not necessarily imply that the estimate.is really within 1% of true mean. It is possible,however, to determine the probability thattheestimateofthemean is within 1% (orany amount)of the true mean. To do this, the probability density function of the estimate must be known.In the case of a large sample size, the central limit theorem comes to the rescue and assures usthat since the estimated mean is related to the sum of a large number of independent randomvariables, the sum is very nearly Gaussian regardless of t?e density function of the individualsample values. Thus, we can say that the probability that X is within 1% of X isPr (9.9 < X ::: 10.1) = F(lO. l) - F(9.9)=<P. (10.1 - 10) - ¢ (9.9 - 10) = ¢(1) - ¢(-1) = 2¢(1) - 10.1 0.1 .= 2 x 0.8413 - 1 = 0.6826Hence, there is a. significant probability (0.3174) that the estimate of the population mean isactually more than 1% away from the true population mean.The assumption of a Gaussian probability density function for sample means is quite realisticwhen the sample size is large, but may not be very good for small sample siz�s. A method ofdealing with small sample sizes is discussed in a subsequent section.The second example considers a situation in which the population size is not large andsampling is done without replacement. In this example, there is a population of 100 bipolar
- 176. 4-Z SAMPL I NG THEORY-TH E SAMPLE MEAN 165transistors for which one wishes to estimate the mean value of the current gain, {3. If the truepopulation mean is, 7J = 120and the true population variance is·aJ = 25,how large a samplesize is required to obtain a- sample mean that has a standard deviation that is 1% of the truemean? Since the desired variance of the sample mean isit follows from (4-5) thatVar(i) = (0.01 x 120)2= 1.44-= 1.4425 (100-n)n 100- IThis may be solved for nto yield n= 14.92,which implies a sample size of 15 since n mustbe an integer. This relatively small sample size is a consequence of having a small populationsize. In this case, for example, a sample size 6f 100(that is, sampling every item) would resultin,a variance of the sample mean of exactly zero.It is also possible to calculate the probability that the·sample mean is within l% of the truepopulation mean, but it is not reasonable in this case to assume that the sample mean has aGaussian density function unless, of course, the original f3 random variables are Gaussian. Thisis because the sample size of 15 is too small for the central limit theorem to be effective. Asa rule of thumb, it is often assumed that a sample size of at least 30is required to make theGaussian assumption. A technique for dealing with smaller sample sizes is considered whensampling distributions are discussed.Exercise 4-2.1An endless production line is turning out solid-state diodes and every 1 QOthdiode is tested for reverse current /_1 and forward current /1 at diode voltagesof -1 and +1 , respectively.a) If the random variable /_1 has a true mean value of 1 o-6 and a varianceof 1 0-12, how many diodes must be tested to obtain a sample meanwhose standard deviation is 5% of the true mean?b) If the random variable /1 has a true mean value of 0.1 and a variance ·of 0.0025, how many diodes must be tested to obtain a sample meanwhose standard deviation is 2% of the true mean?c) If the larger of the two numbers found in (a) and (b) is used for bothtests, what will the standard deviations of the sample mean be for eachtest?Answers: 625, 400, 2 x 1 0-3, 4 x 1 0�a ·
- 177. 166 CHAPTER 4 • ELEMENTS OF STATISTICSExercise 4..:2.2A population of 1 00 resistors is to be tested without replacement to obtain asample mean whose standard deviation is 2% of the true population mean.a) How large must the sample size be if the true population mean is 1 00n and the true standard deviation is 5 n?b) How large must the sample size be if the true population mean is 1 00Q and the true standard deviation is 2 Q?c) If the sample size is 8, what is the standard deviation of the samplemean for the population of part (b)?Answers: 1 , 6, 0.344-3 Sampling Theory-The . Sample VarianceIn the previous section, we. discussed estimating the mean value of a population of randomvariables by averaging the values in the sample taken from that population. We also determinedthe variance ofthat estimate and indicated how it influenced the sample size. However, in additionto the mean value; we.may also be interested in estimating the variance of the random variablesin the population. Aknowledge ofthe variance is important because it indiCates something aboutthe spread of values around the mean. For example, it is not sufficient to test resistors and find.that the ·sample mean is very close to the desired resistance value. If the standard deviation oftheresistance values is very large, then regardless of how close the sample mean is, many of theresistors can be quite far from the desired value. Hence, it is necessary to control the varianceof the population as well as its mean.There is also another reason for wanting to estimate the variance of the population. You mayrecall that the population variance is needed in order to determine the sample · size requiredto achieve a desired variance of the sample mean. Initially, one may not know the populationvariance and, thus, not have any idea as to how large the sample size should be. Estimating thepopulation variance will at least provide some information as to how the sample size should bechanged to achieve the desired results. .The sample variance is denoted initially by S2, the change in notation being adopted in orderto avoid undue notational complexity in distinguishing among the several· variances. In termsof the random variables in the sample, X1 , • • . , Xn, the sample variance m�y-be defined as(4-6)
- 178. 4-3 SAMPLING TH EORY-TH E SAMPLE VARIANCE 167Note that the second summation in this expression is just the sample mean, so the entireexpression represents the sample mean of the square of the difference between the randomvariables and the sample mean.The expected value of S2 can be obtained by expanding the squaredterm in (4-6) and takingthe expected value of each term in the expansion. The details are tedious, but the method isstraightforward and the result isn - IE[S2]= --a2n(4-7)where a2 is the true variance of the population. Note that the expected value of the samplevariance is not the true vanance. Thus, this is a biased estimate of the variance rather thanan unbiased one. For most applications, one would like to have an unbiased estimate of anyparameter. Hence, it is desirable to see if an unbiased estimate can be achieved readily. From(4-7), it is clear that one need modify the original estimate only by the factor n/(n - 1).Therefore, an unbiased estimate of the population variance can be achieved by defining thesample variance assZ ·= _n_S2n - II n � z= -L: (x; - x)n - I . i = l(4-8)Both of the above results have assumed that the population size is very large, i.e., N = oo.When the population is not large, ·the expected value of S2 is given by.2N n - I 2E[S ] =N - I . -n-a (4-9)Note that this is also a biased estimate, but that the bias can be removed by defining S2 assz =N - I . _n_szN n - INote that both of these results reduce to the previous ones as N � oo.(4-10)The variance ofthe estimates ofvariance can also be obtained by straightforward, but tedious,methods. For example, it can be shown that the variance of S2 is given byI I (Y4Var(S2) = ,..,4 -nwhere µ,4 is the fourth central moment of the population and is defined by(4-1 1)(4-12)
- 179. 168 CHAPTER 4 • ELEMENTS OF STATISTICSThe variance of S2 follows immediately from (4-7) and (4-8) asVarS2 =n(µ,4 - a4)(n - 1)2 (4-13)Only the large sample size case will be considered to illustrate an application of the aboveresults. For this purpose, consider again the random time function displayed in Figure 4-1 andfor which the sample mean has been discussed. It is found in that discussion that a sample size of900 is required to reduce the standard deviation of the sample mean to a value that is 1% ofthetrue mean. Now suppose this same sample of size 900 is used to determine the sample variance;specifically, we will use it to calculate S2 as defined in (4--8). Recall that S2 is an unbiasedestimate of the population variance. The variance of this estimate can now be evaluated from(4-13) if we know the fourth central moment. Unfortunately, the fourth central momentis noteasily obtained unless we know the probability density function of the random variables. Forthe purpose of this discussion, let us assume that the random waveform under eonsiderationis Gaussian and that the random variables that make up the sample are mutually statisticallyindependent. From equation (2-27) in Section 2-5, we know that the fourth central moment ofa Gaussian random vafiable is just 3a4• Using this value in (4-13), and remembering that forthis waveform a2 is 9, leads toVar(S2) =900(3 x 92 - 92)= 0.1804(900 - 1)2This value ofvariance corresponds to a standard deviation of0:4247, which is 4.72% ofthe truepopulation variance. One conclusion that can be drawn from this example, and which turns outto be fairly true in general, is that it takes a larger sample size to achieve a given accuracy inestimating the population variance than it does to estimate the population mean.It is also possible to determine the probability that the sample variance is within any specifiedregion if the probability density function of S2 is known. In the large sample size case, thisprobability density function may be assumed Gaussian as is done in the case of the samplemean. In the small sample size case, this is not reasonable. In fact, if the original randomvariables are Gaussian the probability density function of S2 is chi-squared for any sample size.Another situation is discussed in a subsequent section.Exercise 4-3.1For the random waveform of Figure 4-1 , find the sample size that would berequired to estimate the true variance of the waveform witha) a variance of 1 % of the true variance if an unbiased estimator is used
- 180. 4-4 SAMPLING DISTRI BUTIONS AND CONFIDENCE INTERVALS 1 69b) · a variance of 1 % of the true variance if a biased estimator is used.Answers: 1 801 , 1 8024-4 Sampling Distributions and Confidence IntervalsAlthough the mean and variance of any estimate of a population parameter do give usefulinformation about the population, it is not sufficient to answer questions about the probabilitythatthese estimates are within specified bounds. To answer these questions, it is necessary toknow the probability density functions associated with parameter estimates such as the sample ·mean or the sample variance. A great deal of effort has been expended in the study of statisticsto determine these probability density functions and many such functions are described in theliterature. Only two probability density functions are discussed here and these are discussedonly for sample means.·The sample mean is defined in (4-2) aswhere n is the sample size and x; are random variables from the population. If the X; areGaussian and independent, with a mean of X and aVariance of a2, then the normalized randomvariable Z, defined byX - XZ = --- a/Jn(4-14)is Gaussian with zero mean andunit variance. Thus, when thepopulation is Gaussian, the samplemean is also Gaussian regardless ofthe size ofthe population or the size ofthe sample providedthat.the true population standard deviation is known so that it can be used in (4-14) to normalizethe random variable. If the population is not Gaussian, the central limittheorem assures us thatZ is asymptotically ·Gaussian as n --+ oo. Hence, for large n, the sample mean may still beassumed to be Gaussian. Also, ifthe true population variance is not known, the a in (4-14) maybe replaced by its estimate, S;since this estimate should be close to the true value for large n..The questions thal arise in this case, however, are how large does n have to be and what doesone do if n is not this large?· ·A rule of thumb that is often used is that the Gaussian assumption is reasonable if n 2: 30. Ifthe sample size is less than 30, and if the population random variables are not Gaussian, verylittle can be said in general, and each situation must be examined in the light ofits own particularcharacteristics. However, ifthepopulation randomvariablesareGaussianandthetruepopulationvariance is not known, the normalized sample mean is no longer Gaussian because the S th!tt
- 181. 170 CHAPTER 4 • ELEMENTS OF STATISTICSis used to replace a in (4-14) is also a random variable. It is possible to specify the probabilitydensity function of the normalized sample mean, however, and this topic is considered next. .When n < 30, define the normalized sample mean as-X - X X - XT - --- - -----S/,.fo-S/�(4-15)The random variable T is said to have a Students t distribution1with n - 1 degrees offreedom.To define the Students t probability density function, let v= n - 1 be denoted as the degreesof freedom. The density function then is defined by(v+l)r -2- . t2 - "iifT(t) = (" ) (1 + -)�r 2v(4-16)where f (· ) is the gamma function, some ofwhose essential properties are discussed below. Thisdensity function, for v = 1 , is displayed in Figure 4-2, along with the normalized Gaussiandensity function forpurposes ofcomparison. It may benotedthatthe Students t density functionhas heaviertails thandoes the Gaussiandensityfunction. However, whenn ::: 30, thetwo densityfunctions are almost indistinguishable .To evaluate the Students t density function it is necessary to evaluate the gamma function.Fortunately, this can be done readily in this case by noting a few special relations. First, there isa recursion relation ofthe formFigure 4-2 Comparison ofStudents t and Gaussian probabilitydensity functions.f(x)0.4I The Students t distribution was discovered by William Gosset, who published it using the peri nameStudent because his employer, the Guinness Brewery, had a strict rule against their employ�s pub!lshingtheir discoveries under their own names.
- 182. 4-4 SAMPLING DISTRIBUTIONS AND CONFIDENCE I NlERVALS 171f(k + 1) = kf(k) any k= k! integer kNext, some special values of the gamma function aref(l) = f(2) = 1, f(l/2) = J"ii(4-17)Note that in evaluating the Students t density function all argurn�nts of the gamma functionare either integers or one-half plus an integer. As an illustration of the application of (4-17), letk = 3.5. Thusf(3.5) = 2.5 . f(2.5) = 2.5 . 1 .5 . f(l.5) = 2.5 . 1 .5 . 0.5 . r(0.5)= 2.5 . 1.5 . 5 . J"ii = 3.323The concept of a confidence interval is one that arises very often in the study of statistics.Although the confidence interval is most appropriately considered in connection with estimationtheory, it is convenient to discuss it here as an application of the probability density functionof the sample mean. The sample mean, as we defined it, is really a point estimate in the sensethat it assigns a single. value to the estimate. The alternative to a point estimate is an intervalestimate in which the parameter being estimated is declared to lie within a cemi.in interval witha certain probability. This interval is the confidence interval.More specifically, a q-percent confidence interval is the_interval within which the estimatewill lie with a probability of q/100. The limits of this interval are the confidence limits and thevalue ofq is said to be the confidence level.·When considering the sample mean, the q-percent confidence interval is defined as- ka � - kaX - - < X < X + -. ..;n - - Jn(4-18)where k is a constant that depends upon q and the probability density function of X. Specifically,1x+k<1q = 100 J�(x)dxx-ku x(4-19)For the Gaussian density function, the values of k can be tabu!ated readily as a function ofthe confidence level. A very limited table of this sort is given in Table 4-1 .As an illustration of the use of this table, consider once again the random waveform ofFigure4-:1 for which the true population mean is 10, the true population variance is 9, and 900 samplesare taken. _The width of a 95% confidence interval is just0 l.96J9 �10l.96J91 - < X < + .J900 - - J900
- 183. 172 CHAPTER 4 • ELEMENTS OF STATISTICSTable 4-1 Confidence Interval Width for a Gaussian Density Functionq%90959999.999.999.804 s x s 10.196k1.641.962.583.293.89Thus, there is a probability of 0.95 that the sample mean will lie in the interval between 9.804and 10."196.Itis worth noting that large-confidence levels correspondto wide confidence intervals. Hence,there is a small probability that an estimate will lie within a very narrow confidence interval,but a large probability that it will lie within a broad confidence interval. It follows, therefore,that a 99% confidence level represents apoorer estimate than does, say, a 90% confidence levelwhen the same sample sizes are being compared.The same information regarding confidence intervals can be obtained from the probabilitydistribution function. Note that the integral in (4-19) can be replaced by the difference of twodistribution functions. Hence, this relation could have been written as(4-20)It is also possible to tabulate k-values for the Students t distribution, but a different set ofvalues is required foreach value ofv, the degrees offreedom. However, it is customary to presentthis information in terms ofthe probability distribution function. A modest table ofthese valuesis given in Appendix F, while a much smaller table for the particular case of eight degrees offreedom is given in Table4-2 to assist in the discussion that follows.The application of this table to several aspects of hypothesis testing is discussed in the nextsection.Table 4-2 Probability Distribution for Studentss t Function (v = 8)0.2620.7061.3971.8602.3062.8963.355Fr (t)0.600.750.900.950.9750.990.995
- 184. 4-5 HYPOTH ESIS TESTI NGExercise 4-4.1Calculate the probability density function for the Students t density for t =1 and fora) 4 degrees of freedomb) 9 degrees of freedom.Answers: 0.21 47, 0.2291Exercise 4-4.2A very large population of resistor values has a true mean of 1 00 Q and asample standard deviation of 4· n. Find the confidence limits on the samplemean for a confidence level of 95% if it is computed from·a) a sample size of 1 00b) a sample size of 9.Answers: 97.52 to 1 02.48; 99.22 to 1 00.784-5 Hypothesis Testing173One of the important applications of statistics is making decisions about the parameters of apopulation. In the preceding sections we have seen how to estimate the mean value orthe varianceof a population and how to assign confidence intervals to these estimates for any specified levelofconfidence. The next step is to make some hypothesis about the population and then determineif the observed sample confirms or rejects this hypothesis. For example, a manufacturer mayclaim that the light bulbs he produces have an average lifetime of 1000 hours. The hypothesis isthen made that the mean 1value of this population (i.e., the lifetimes of all light bulbs produced)is 1000 hours. Since it is not possible to run life tests on all the light bulbs produced, a smallfraction is tested and the sample mean determined. The question then is: does the result of thistest verify the hypothesis? To take an extreme example, suppose only two light bulbs are testedand the sample mean is found to be 900 hours. Does this prove that the hypothesis about theaverage lifetime of the population of all light bulbs is false? Probably not, because the samplesize is too small to be able to make a reasonable decision. On the other hand, suppose the samplemean of these two light bulbs is 1000 hours. Does this prove that the hypothesis is correct?Again, the answer is probably not. The question then becomes: how does one decide to acceptor reject a given. hypothesis when the sample size and the confidence level are specified? We
- 185. 174 CHAPTER 4 • ELEMENTS OF STATISTICSn_ow have the background necessary to answer that question and will do so in several specificcases by means of examples.One way ofclassifying hypothesis tests is based on whether they are one-sided or two-sided.In a one-sided test, one is concerned with what happens on one side of the desir.ed value of theparameter. For example, in the light bulb situation above, we are concerned only if the averagelifetime is less than l000 hours and would be happy to have the average lifetime greater than1000 hours by any amount. There are many other situations of a comparable nature. On theother hand, in a two-sided test we are concerned about deviations in either direction from thehypothesized value. For example, if we have a supply of 100-Q resistors that we are testing, itis equally serious if the resistance is either too high or too low.To consider the one-sided test first, imagine that a capacitor manufacturer claims that hiscapacitors have a mean value ofbreakdown voltage of 300 V or greater. We test the breakdownvoltage of a sample of 100 capacitors and find that the sample mean is 290 V and the unbiasedsample standard deviation, S, is 40 V. Is the manufacturers claim valid if a 99% confidencelevel is used? Note that this is a one-sided test since we do not care how much greater than300 V the mean value of breakdown voltage might be. .We start by making the hypothesis that the true mean value of the population is 300 V andthen check to see if this hypothesis is consistent with the observed data. Since.the sample sizeis greater than 30, the Gaussian assumption may be employed here, with a set equal to S. Thus:the value of the normalized random variable, Z = z, is·z=x- x =290 - 300= -2.5a!Jn 40/v100For a one-sided confidence level of 99% the critical value of z is found froml.hat-value..abovewhich the area of Fz(z) is 0.99. That is,/00JZ(z) dz = 1 - <l>(zc) = 0.99 ·Zcfrom which Zc = -2.33. Since the observed value of z is less than Zc. we would reject thehypothesis; that is, we would say that the claim that the mean breakdown voltage is 300 V orgreater is not valid.An often confusing point. in connection with hypothesis testing is the real meaning of thedecision made. In the example above, the decision means that there is a probability of 0.99 thatthe observed sample did not come,from a population having a true mean of 300 V. This seemsclearenough; the confusingpoint, however, is thathad we chosen a confidence level of99.5% wewouldhave accepted the hypothesis because the critical value ofz for this level ofconfidence is-2.575 and the observed z-value is now greaterthan Zc· Thus, choosing a high confidence levelmakes it more likely that any given sample will result in accepting the hypo�e_sis. This seemscontrary to logic, but the reason is clear; a high confidence.results in a widerconfidence intervalbecause a greaterfraction ofthe probability density function must be contained in it. Conversely,selecting a small confidence level makes it less likely that any given sample will result in acceptingthehypothesis and, thus,is amore severe requirement. Becausethe use ofthetermconfidence
- 186. 4-5 HYPOTH ESIS TESTI NG 175level does seem to be contradictory, some statisticians prefer. to use the level of significance,which is just the confidence level subtracted from 100%. Thus, a confidence level of 99%corresponds to a 1% level ofsignificance while a confidence level of99.5% is only a 0.5% levelofsignificance. Alargerlevel ofsignificancecorresponds to a more severetest ofthe hypothesis.The example concerning the capacitor breakdown voltage is now reconsidered wlien thesample size is small. Supposewe test only 9 capacitors andfindthatthe mean value ofbreakdownvoltage is 290 V and the unbiased sample standard deviation is 40 V. Note that these are thesame values that.were obtained with alarge sample size. However, since the sample size is lessthan 30 we will use the T random variable, which for this case isx- x 290 --300t= s/.Jn=40= -0.15.j9For the Students t density function with v= n - 1 = 8 degrees of freedom, the critical valueoft for a confidence level of 99% is, fromTable 4-2, tc = -2.896. Since the observed value oft is now greater than tc we would accept the hypothesis that the true mean breakdown voltageis 300 V or greater.·Note that the use of a small sample size tends to increase the value of t and, hence, makesit more likely to exceed the critical value. Furthermore, the small sample size leads to the useofthe Students t distribution, which has heavier tails than the Gaussian distribution and, thus,leads to a smaller value of tc. Both of these factors together make small sample size tests lessreliable than large sample size tests.The next example considers a two-sided hypothesis test. Suppose that a manufacturer ofZener diodes claims that a certain type has a mean breakdown voltage of 10 V. Since a Zenerdiode is used as a voltage regulator, deviations from the desired value of breakdown voltage ineither direction are equally undesirable. Hence, we hypothesize that the true mean value of thepopulation is 10 V and then seek a test that either accepts or rejects this hypothesis and utilizesthe fact thatdeviations on either side of 10 are of concern.Considering a large sample size test first, suppose we test 100 Zener diodes and find that thesample mean is 10.3 V and the unbiased sample standard deviation is 1.2V. Is the claim valid ifa95% confidence level is used? Since the sample size is greater than 30, we can use the Gaussianrandom variable, Z, which for this sample isz =10.3 - 10= 2.51.2/.JlOOFor a 95% con(idence level, the critical values ofthe Gaussian random variable are, from Table4-1, ±1.96. Thus, in order to accept the hypothesis it is necessary for z to lie in the region-1.96 S z S 1 .96. Since z = 2.5 does not lie in this interval, the hypothesis is rejected; thatis, the manufacturers claim is not valid since the observed sample could not have come from apopulation having a mean value of 10 with a probability of 0.95.This same test is now repeated with a small sample size. Suppose that 9 Zener diodes aretested and it is found that the mean value of their breakdown voltages is again 10.3 ·V and
- 187. 176 CHAPTER 4 • ELEMENTS OF STATISTICStne unbiased sample standard deviation is 1 .2 V. The Students t random variable now has avalue ofx-x 10.3 - 10t = -- = = 0 75s/.fo i .2;-J§·The critical values of t can be obtained from Table 4-2, since there are once again 8 degrees·of freedom. Since Table 4-2 lists the distribution function for the Students t random variableand we are interested in finding the interval around zero that contains 95% ofthe area, there willbe 2.5% ofthe area above tc and 2.5% below tc. Thus, the value that we needfrom the table isthat corresponding to 0.975. This is seen easily by noting thatPr [-tc < T ::S tc] = Fr Ctc) - }r (-tc) = 2Fr Ctc) - 1 = 0.95Therefore1 .95Fr (tc) = -2- = 0.975From Table 4-2 the required value is tc = 2.306. To accept the hypothesis, it is necessary thatthe observed value of t lie in the range -2.306 < t :::=: 2.306. Since t = 0.75 does lie in thisrange, the hypothesis is accepted and the manufacturers claim is considered to be valid. Againwe see that a small sample test is not as severe as a large sample test.Exercise 4-5.1A certain type of bipolar transistor is claimed to have a mean value of currentgain of, f3 :::: 225. A sample of these transistors is tested and the sample meanvalue of current gain is fobl!ld to be 21 O and the unbiased san:iple standarddeviation is 40. If a 97.5% confidence level is employed, is this claim valid ifa) the sample size is 81 ?b) the sample size is 1 6?Answers: z = 3.38, Zc = 2.31 , no;t = 1 .5, tc = 2. 1 3, yesExercise 4-5.2A certain type of bipolar transistor is claimed to bave mean collector currentof 1 O mA. A sample of these transistors is tested and the sample mean
- 188. 4-6 CURVE FITTING AND LINEAR REGRESSIONvalue of collector current is found to be 9.5 mA and the unbiased samplestandard deviation is 0.8 mA. If a 97.5% confidence level is employed, isthis claim valid ifa) the sample size is 81 ?b) the sample size is 1 6?Answers: z == -3.00, Zc = ± 1 .96, no;t = - 1 .33, tc = ± 0.269, yes.4-6 Curve Fitting and Linear Regression177The topic considered in this section is considerably different from those in previous sections,but it does represent an important application of statistics in engineering problems. Frequently,statistical data reveal a relationship between two or more variables and it is desired to expressthis relationship in mathematical form by determining an equation that connects the variables.For example, one might collect data on the lifetime of light bulbs as a function of the appliedvoltage. Such data might be presented in the form ofa scatterdiagram, such as showri in Figure4-3, in which each observed lifetime and the corresponding operating voltage are plotted as apoint on a two-dimensional planeAlso shown in Figure 4-3 is a solid curve that represents, in some sense, the best fit betweenthe data points and a mathematical expression that relates the two variables. The objective ofthis section is to show one way of obtaining such a mathematical relationship.·For purposes of discussion, it is convenient to consider the two variables as x and y. Sincethe data consist of specific numerical values, in keeping with our previously adopted notation,these data are represented by lower case letters. Thus, for a sample size of n we would havevalues of one variable denoted as x1 , x2 , . . . , Xn and corresponding values ofthe other variablefigure 4-3 Scatter diagram of light bulblifetimes and applied voltage.Applied voltage ( V J
- 189. 1 78 CHAPTER4 · ELEMENfS OF STATISTICSas y1 , y2 , . . . , y11 • Forexample, for the data displayed in Figure 4-3 each x-value might be anapplied voltage and each y-value the corresponding lifetime.The general problem of finding a mathem3:tical relationship to represent the data is calledcurvefitting. The iesulting curve is called a regression curve and the mathematical equation isthe regression equation. To find a "best" regression equation it is first necessary to establish acriterion that will be used to define what is meant by "best." Consider the scatter diagram andregression curve shown in Figure 4-4.In this figure, the difference between the regression curve and the corresponding value of yat any x is designated as d; , i = 1 , 2, . . . , n. The criterion of goodness of fit that is employedhere is thatdf + di + · · · + i?, = a mm1mum (4-21)Such a criterion leads to a least-squares regression curve and is the criterion that is most oftenemployed. Note that the least-squares criterion weights errors on either side of the regressioncurve equally and also weights large errors more than small errors.Having decided upon a criterion to use, the next step is to select the type of the equation thatis to be fitted to the data. This choice is based largely on the nature of the data, but most often apolynomial of the formy = a + bx + cx2 + · · · + kxjis used. Although it is possible to fit an (n - 1 )-degree polynomial to n data points, one wouldnever want to do this because it would provide no smoothing of the data. That is, the resultingpolynomial would go through each data point and the resulting least-squares error would be zero.Since the data are random, one is more interested in a regression curve tbat approxima�es themean value of the data. Thus, in most cases, a first- or second-degree polynomial is employed.Our discussion in this section is limited to using a first-degree polynomial in order to preservesimplicity while conveying the essential aspects of the method. This technique is referred to aslinear regression.yfigure 4-4 Error between the Y1regression curve and the scatterdiagram.
- 190. 4-6 CURVE FITTING AND -LINEAR REGRESSION 179The linear regression equation becomesy = a + bx (4-22)in which it is necessary to determine the values ofa and bthat satisfy (4-21). These are determinedby writingnL [Yi - (a + bxi))2 = a minimumi=lTo minimize this expression, one would differentiate partially with respect to a and b and setthe derivatives equal to zero. This leads to two equations that may be solved simultaneously forthe values of a and b. The equations areandn nL Yi = an + b l:x;i=l i=ln n nl:x;y; = a l:x; + b l:x?i=l i=IThe resulting values of a and b areandn n nn l:x;y; - I:x; z:>ib =i=l i=l i=ln n n n n nL Y; Lx? - l:x; LXiYi LY; - b LXii=l i=l i=l i=l i=l i=l(40---23)(40---24)Although these are fairly complicated expressions, they can be evaluated readily by computeror programmable calculator. For example, MATLAB has a function y = polyfit(y,x,n) thatgenerates a vector of coefficients, p, corresponding to the nth-order polynomial that fits the datavector, y, in a least-squares sense with the polynomial
- 191. 1 80 CHAPTER 4 · ELEMENTS OF STATISTICSp(x) = p ( l )x" + p(2)xn- I + · · · + p (n + 1 )This is called a regression equation. The regression curve may be evaluated using the functiony = polyval(p,x) where p is the vector of polynomial coefficients and x is the vector ofelements at which the polynomial is to be evaluated. As an example, consider the followingMATLAB program that generates a set of data representing a straight line, adds Gaussiannoise, determines the coefficients of the linear regression equation, and plots the data and theregression curve.%Lstsqr.mX=0: .5: 1 0;a=2; b=4;y1 =a*ones(size(x)) + b*x;y2=y1 + 5*randn(size(x));p=polyfit(x,y2, 1 );aest=p(2)best=p(1 )y3 = polyval(p,x);plot(x,y2,o,x,y3,-)xlabel(X); ylabel(Y)% independent variable% coef of straight line% values of straight line% add noise% regression coefficients% estimate of a% estimate of b% values of regression curveThe resulting data and regression curve are shown in Figure 4-5.As another example consider the data in Table 4-3, which represent the measured relationshipbetween temperature and breakdown voltage of a sample of capacitors. A plot of these dataindicates that it could not be well represented by a straight line and so it will be fit�ed with asecond-order polynomial. Using the polyfit function of MATLAB the equation of the secondorder regression curve is found to beVB = -0.0334T2 - 0.6540T + 426.0500Figure 4-6 shows the data and the second-order regression curve.Table 4-3 Pata for Breakdown Voltage versus TemperatureT, XjVB , YiJ O425220400330366440345550283660298770205880189990831010022
- 192. 4 - 6 CURVE FITTI NG AND LINEAR REGRESS ION5045 00403530Y 25201510500 2 4 6 8 10xFigure 4-5 Example of a linear regression curve.400w350CJ�300::=- 250z� 200�� 150a:Cll 1005000 20 40 60 soAMBIENT TEMPERATUREFigure 4-6 Regression curve fitting data of Table 4-3.1001 8 1
- 193. 182 CHAPTER 4 · ELEMENTS OF STATISTICSSimilar techniques can be used to fit higher degree polynomials to experimental data.Obviously, the difficulty in determining the best values for the polynomial coefficients increasesas the degreeofthe polynomial increases. However, thereare very effectivematrix formulationsof the problem that lend themselves readily to computational methods.Exercise 4-6.1Four light bulbs are tested to establish a relationship between lifetime andoperating voltage. The resulting data are shown in the following table:V, x;Hrs., Yi .105140021 101 20031 1 51 1204120950Find the coefficients of the linear regression curve and plot it and the scatterdiagram.Answers: -28.6, 4385Exercise 4-6.2Assume that the linear regression curve determined in Exercise 4-6.1 holdsfor all values of voltage. Find the expected lifetime of a light bulb operating .at a voltage ofa) 90 Vb) 1 1 2 vc) 1 30 V.Answers: 1 1 82, 1 81 1 , 6674-7 _Correlation between Two Sets of DataA topic that is closely related to the concept of linear regression is that ofdetermining iftwo setsofobserved data are correlated or not. The degreeofsuchcorrelationis obtainedfrom the linearcorrelation coefficient. This coefficient may lie between - 1 and + 1 and is zero if there is no
- 194. PROBLEMS f83correlation between the two sets of data. The definition of linear correlation used here assumesthat each set of data has exactly nsamples, although more general definitions are possible.The linear correlation coefficient (referred to in the statistical literature as Pearsonsr)isobtained fromwhereandn�)x; - x)(y;- y)i=Ir = ----;:::=========--;::==========n n�)x; - x)2 L(Y; - y)2i=I i=I1nx=- L:x;n i= I1ny = - L Yin i=I(4-25)Because the observed sa..nplevaluesarerandom, the calculated value ofris also random. Whenn is large (say, greater than 500), the distribution of r is approximately Gaussian.The linear correlation coefficient may be useful in determining the sources oferrors that arisein a system. Ifone observes quantities that might lead to an error at the same time that the errorsare observed, then those quantities that show a significant positive correlation with the error arelikely to be a major contributor to the error. What value of ris significant depends upon thenumber of samples observed and the distribution functions of these samples, but generally avalue greaterthan 0.5 may be considered significant. Small values ofr are relatively meaninglessunless n is very large and the probability distributions ofx and yare known.As an example ofthe use ofthelinear correlation coefficient consider apoint-to-pointdigitalcommunication link using highly directional antennas. A measure of the quality of this link isthe probability of bit error, which is also called the bit error rate (BER). It is observed in sucha system that the BER may fluctuate randomly at a fairly slow rate. A possible cause for thisfluctuation is the wind, which produces atmospheric turbulence and vibration in the antennastructures. For the purpose of this example, assume that 20 measurements of wind velocity aremade simultaneously with measurements of BER. The resulting data are displayed in Figure4-7, in which the BER has been scaled by 1Q8 so that it can be plotted on the same scaleas the wind velocity. Using these data in (4-25) leads to r = 0.89 1 , from which it may beconcluded that wind velocity is a major contributor to errors in the transmission channel. Notethat the plot of these data is not very helpful in making such a conclusion because of the largevariability of the data. Note also that the data would show a large variation around the linearregression curves.
- 195. 184 CHAPTER 4 • ELEMENTS OF STATISTICS20x x1 801 6 0a: 14w�ID xi 0iii12II II0rn 0 IlliS 1 0 x 0·fxIS x• 0 0 x� 80xII IS 0" x 0 x.!Os: 6 0x4x is scaled BER2 o is wind velocity x0 .0 2 4 6 8 1 0 1 2 1 4 1 6 1 8 20.sample NumberFigure 4-7 Sample values of wind velocity and BER.PROBLEMS4-2. 1 A calculator with a random number generator produces the following sequence ofrandom numbers: 0.276, 0. 123, 0.072, 0.324, 0.8 15, 0.3 1 2, 0.432, 0.283, 0.717.a) Find the sample mean.b) Ifthe calculator produces three digit random numbers that are uniformly distributedbetween 0.000 and 0.999, find the variance of the sample mean.c) How large shouldthe sample size beinorderto obtain asamplemean whose standarddeviation is no greater than 0.0 1 ?4-2.2 Generate 3 0 sets of 1 0 each ofa uniformly distributed random variable extending overthe interval (0, 10). Foreach setofsamples compute the estimate ofthepopulation meanand from these 30 values for the mean compute the variance of the estimate. Repeatthis five times and compare the results with the theoretical value for the variance oftheestimate given by equation (4-4).4-2.3 Repeat problem 4-2.2 using 30 sets of 30 each ofa random variable having a Gaussiandistribution with zero mean and standard deviation of 10.
- 196. PROBLEMS 1854-2.4 A political poll is assessing the relative strengths of two presidential candidates. Avalue of +1 is assigned to every person who states a preference for candidate A and a.value of -1 is assigned to anyone who indicates a preference for candidate B.a) Find the sample mean if 60% of those polled indicate a preference for candidate A.b) Write an expression for the sample mean as a function of the sample size and thepercentage of those polled that are in favor of candidate A.c)·Find the sample size necessary to estimate the percentage of persons in favor ofcandidate A with a standard deviation no greater than 0.1%.4-2.5 In a class of50 students, the resultofaparticular examination is a true mean of70 and atrue variance of 12. It is desired to estimate the mean by sampling, withoutreplacement,a subset of the scores.a) Find the standard de.viation of the sample mean if only 10 scores are used.b) How large shouldthe sample size be for the standard deviation ofthe sample meanto be one percentage point (out of 100)?c) How large should the sample size be for the standard deviation of the sample meanto be 1 % of the true mean?4-2.6 The HYGAYN Transistor Company produces a line of bipolar transistors that has anaverage current gain of 120 with a standard deviation of 10. Another company, ACEElectronics, produces a similar line of transistors with the same average current gainbut with a standar� deviation of 5. Ed Engineer purchases 20 transistors from eachcompany and mixes them togeth�r.a) IfEd selects a random sample offive transistors with replacement, find the varianceof the sample mean.b) If Ed selectsIa random Sample Of five transistors Without replacement, find thevariance of the sample mean.c) How large a sample size should Ed use, without replacement, in order to obtain astandard deviation of the sample mean of 2?4-2.7 For the transistors of Problem 4-2.6, assume that the current gains are independentGaussian random variables.a) IfEdselects arando:n sample of IOtransistorswithreplacement, findthe probabilitythat the sample mean is within 2% of the true mean.
- 197. 186 CHAPTE.R 4 • E.LE.ME.NTS OF STATISTICSb) Repeat part (a) if the sampling is without replacement.4-3.1 a) For the random numbers given in Problem 4-2.1, find the sample variance if anunbiased estimator is used.b) Find the variance of this estimate ofthe population variance.4-3.2 A zero-mean Gaussian random time function is sampled so a� to obtain independentsample values. How many sample values are required to obtain an unbiased estimateof the variance of the time function with a standard deviation that is 2% of the truevariance?4-3.3 It is desired to estimate the variance of a random phase angle that is uniformlydistributed overarange of2n. Findthe numberofindependent samples thatarerequiredto estimate this variance with a standard deviation that is 5% ofthe true variance if anunbiased estimate is used.4-3.4 Independent samples are taken from a random time function having a probabilitydensity function off(x) = e-x= 0x :::: ox < OHow many samples are required to estimate the variance of this time function with astandard deviation that is five per�ent ofthe true value ifan unbiased estimator is used?4-4. 1 a) Calculate the value of the Students t probability density function for t = 2 and for6 degrees of freedom.·b) Repeat (a) for 12 degrees of freedom.4-4.2 A very large population of bipolar transistors has a current gain with a mean value of120 and a standard deviation of 10. The values of current gain may be assumed to beindependent Gaussian random variables.·a) Find the confidence limits for a confidence level of 90% on the sample mean if it iscomputed from a sample size of 150.b) Repeat part (a) if the sample size is 21.4-4.3 Repeat Problem 4-4.2 ifa one-sided confidence interval is considered. That is, find thevalue of current gain above which 90% of the sample means would lie.
- 198. PROBLEMS 1874-5. 1 The resistance of coils manufactured by a certain company is claimed to have a meanvalue ofresistance of 100 Q.A sample of9 coils is taken and it is found that the samplemean is 1 1 5 Q and the sample standard deviation is 20 Q.a) Is the claimjustified ifa 95% confidence level is used?b) Is the claimjustified if a 90% confidence level is used?4-5.2 Repeat Problem 4-5. l if the sample size is 50 coils, the sample mean is still 1 1 5 Q,and the sample standard deviation is 10 Q.4-5.3 Write a MATLAB function for the Students t probability density function and plot theprobability density function for (-5 < 5 S 5) and v = 4and 12.4-5.4 A manufacturer of traveling wave tubes claims the mean lifetime is at least 4 years.Twenty of these tubes are installed in a communication satellite and a record kept oftheir performance. It is found that the mean lifetime of this sample is 3.7 years and thestandard deviation of the sample is I year.a) For what confidence level would the companys claim be valid?b) What must the mean lifetime of the tubes have been in order for the claim to bevalid at a confidence level of 90%?4-5.5 A manufacturer of capacitors claims the breakdown voltage has a mean value of atleast 100 V. A test of nine capacitors yielded breakdown voltages of 97, 1 04, 95, 98,106, 92, 1 1 0, 1 03, and 93 V.a) Find the sample i:nean.b) Find the sample variance using an unbiased estimate.c) Is the manufacturers claim valid if a confidence level of 95% is employed?4-6. 1 Dataaretaken for arandom variable Y as afunction ofanothervariableX. Thex-valuesare 1 , 3, 4, 6, 8, 9, 1 1 , 14 and the corresponding y-values are I I , U., 14, 15, 17, 18, 19.a) Plot the scatter diagram for these data.b) Find the linear regression curve that best fits these data.4-6.2 A test is made of the breakdown voltage of capacitors as a function of the capacitance. For capacitance values of 0.000 1 , 0.00 1 , 0.0 1 , 0. 1 , 1 , 10 µ,F the correspondingbreakdown voltages are 3 1 0, 290, 285, 270. 260, and 225 V.
- 199. 188 CHAPTER 4 • ELEMENfS OF STATISTICSa) Plot the scatter diagram. for these data on a semi-log coordinate system.b) Find the linear regression curve that best fits these data on a semi-log co_ordinatesystem.4-6.3 It is possible to use least-squares methods with curves other than polynomials. Forexample, consider a hyperbolic curve of the formy = l/(a +bx)Data corresponding to such a curve can be handled by fitting a first-order polynomialto the reciprocal of y, i.e., g = l/y = a +bx from which the coefficient� a and b canbe found. Use this procedure to fit the following data set with a hyperbolic regressioncurve of this form. Is this a valid least-squares fit to the y data?x 0 2 3 4 5 6 7 8 9 10y 1.04 0.49 0.27 0.27 0.16 0. 16 0.06 0.08 0.16 0.12 0.07References1 . Mosteller, F., S, E. Feinberg, and R. E. K. Rourke, Beginning Statistics with Data Analysis. Rea�ng,Mass.: Addison-Wesley, 1983.A recent undergraduate text that emphasizes the data analysis aspects ofstatistics. The mathematicallevel ofthe text is somewhat low for engineering students, but the numerous excellent examples illustratethe concepts very well.2. Spiegel, M. R., Theory and Problems ofProbability and Statistics. Schaums Outline Series in Mathematics. New York: McGraw-Hill, Inc., 1975.The chapters on statistics in this outline give concise and well-organized definitions of many of theconcepts presented in this chapter. In addition, there are many excellent problems for which answersare provided.
- 200. CHAPTER 5------�---Random Processes5-1 IntroductionIt was noted in Chapter 2 that a random process is a collection of time functions and anassociated probability description. The probability description may consist of the marginaland joint probability density functions of all random variables that are point functions of theproces.s at specified time instants. This type of probability description is the only one that willbe considered here.The entire collection of time functions is an ensemble and will be designated as {x (t)}, whereany particular member of the ensemble, x(t), is a samplefunction of the ensemble.· In general,only one sample function of a random process can ever be observed; the other sample functionsrepresent all of the otherpossible realizations that might have occurred but did not. An arbitrarysample function is denoted X(t). The values of X(t) at any time ti define a random variabledenoted as X(t1) or simply X1. ·The extension of the concepts of random variables to those of random processes is quitesimple as far as the mechanics are concerned; in fact, all of the essential ideas have already beenconsidered. A more difficult step, however, is the conceptual one of relating the mathematicalrepresentations forrandom variables to the physical properties ofthe process. Hence, the purposeof this chapter is to help clarify this relationship by means of a number of illustrative examples.Many different classes of random processes arise in engineering problems. since methods ofrepresenting these processes most efficiently do depend upon the nature of the process underconsideration, it is necessary to classify random processes in a manner that assists in determiningan appropriate type ofrepresentation. Furthermore, it is important to develop a terminology thatenables us to specify the class of process under consideration in a concise, but complete, mannerso that there is no uncertainty as to which process is being discussed.Therefore, one of the first steps in discussing random processes is that of developing aterminology that can be used as a "short-cut" in the description of the characteristics of anygiven process. A convenient way of doing this is to use a set of descriptors, arranged in pairs,189
- 201. 190 CHAPTER 5 • RANDOM PROCESSESand to select one name from each pair to describe the process. Those pairs of descnptors thatare appropriate in the present discussion are1. Continuous; discrete2. Deterministic; nondeterministic3. Stationary; nonstationary4. Ergodic; nonergodicExercise 5-1 .1a) If it is assumed that any random process can be described by pickingone descriptor from each pair of descriptors shown above, how manyclasses of random processes can be described?b) It is also possible to consider mixed processes in which two or morerandom processes of the type described in (a) above are combined toform a single random process. If two random processes of the typedescribed . in (a) are combined, what is the total number of classesof random processes that can be described now by the above list ofdescriptors.?Answers: 16, 256Exercise 5-1 .2a) A time function is generated by flipping two coins once every second. Avalue of +1 is assigned to each head and a value of -1 is assigned toeach tail. The time function has a constant value equal to that obtainedfrom the sum of the two coins for 1 second and then changes to the newvalue determined by the outcome on the next flip of the coins; Sketcha typical sample function of the random process defined in this way.Let the sample function be 8 seconds long and let it exhibit all possiblestates with the correct probabilities.b) How many possible sample functions, each 8 seconds long, does theentire ensemble of sample functions for this random process have?Answer: 6561
- 202. 5 - 2 CONTINUOUS AND DISCRETE RANDOM PROCESS ES 1915-2 Continuous and Discrete Random ProcessesThese terms normally apply to thepossible values ofthe random variables. A continuousrandomprocessis one in which random variables such as X (t1), X(t2),and so on, c_an assume anyvaluewithin a specified range of possible values. This range may be finite, infinite, or sem1�:ifinit1�.Such things as thermal agitation noise in conductors, shot noise in electron tubes or transistors,and wind velocity are examples of continuous random processes. A sketch of a typical samplefunction and the corresponding probability density function is shown in Figure 5-1 . In thisexample, the range of possible values is semiinfinite. ·A more precise definition for continuous random processes would be that the probabilitydistribution function is corttinuous. This would also imply that the density function has no 8functions in it.A discreterandomprocessis one in which the random variables can assume only certainisolated values (possibly infinite in number) and no other values. For example, a voltage that iseither·oor 1 00 because of random opening and closing of a switch would be a sample functionfrom adiscrete randomprocess. This is illustrated in Figure 5-2. Note that theprobability densityfunction contains only8 functions.It is also possible to have mixedprocesses, whichhaveboth continuous and discrete components. For example, the current flowing in an ideal rectifier may be zero for one-half the time,as shown in Figure 5-3. The corresponding probability density has both a continuous part anda 8 function.Some other examples of random processes will serve to further illustrate the concept ofcontinuous and discrete random processes. Thermal noise in an electronic circuit is a typicalexample of a continuous random process since its amplitude can take on any positive ornegative value. The probability density function of thermal noise is a continuous function fromminus infinityto plus infinity. Quantizing error associated with analog-to-digital conversion, asdiscussed in Section 2-7, is another example of a continuous random process since this errormay have any value within a finite range of values determined by the size of the incrementbetween quantization levels. The probability density function for the quantizing error is usuallyassumed to be uniformly distributed over the range of possible errors. This case repr�sents af(x)0(a) (b)Figure 5-1 A continuous random process: (a) typical sample function and (b) probability densityfunction.
- 203. 1920X(t)100 CHAPTER 5 · RANDOM PROCESSES(a)f(x)12(b)12Figure 5-2 A discrete random process: (a) typical sample function and (b) probability density function.X(t)0(a)0f(x)12(b)Figure 5-3 A mixed random process: (a) typical sample function and (b) probability density function.minor departure from the strict mathematical definition for a continuous probability densityfunction since the uniform density function is not continuous at the end points. Nevertheless,since the density function does not contain any 8 functions, we consider the random process tobe continuous for purposes of our classification.On the other hand, if one represents the number of telephone calls in progress in a telephonesystem as a random process, the resulting process is discrete since the number of calls must bean integer. The probability density function for this process contains only a large number of 8functions. Another example of a discrete random process is the result of quantizing a samplefunction from a continuous random process into another random proc�ss that can have only afinite number ofpossible values. For example, an 8-bit analog-to-digital converter takes an inputsignal that may have a continuous probability density function and converts it into one that hasa discrete probability density function with 256 8 functions.Finally we consider some mixed processes that have both a continuous component and adiscrete component. One such example is the rectified time function as noted above. Anotherexample might be a system containing a limiter such that when the output magnitude is less thanthe limiting value, it has the same value as the input. However, the output magnitude can neverexceed the limiting value regardless of how large the input becomes. Thus, a sample function
- 204. 5-3 DETERMINISTIC AND NONDETERMINISTIC RANDOM PROCESSES 193from a continuous random process on the input will produce a sample function from a mixedrandom process on the output and the probability density function of the output will have botha continuous part and a pair of 8 functions.In all of the cases just mentioned, the sample functions are continuous in time;that is, arandom variable may be defined for any time. situations in which the random variables exist forparticular time instants only (referred to as pointprocessesor timeseries)are not discussed inthis chapter.Exercise 5-2.1A random noise having a Rayleigh probability density function with a meanof 1 V is added to a de voltage having a value of either +1 or -1 V withequal probability.a) Classify the resulting signal as continuous, discrete, or mixed.b) Repeat the classification after the signal is passed through a half-waverectifier.Answers: Mixed, continuousExercise 5-2.2A random time function has a mean value of 1 and an amplitude that hasan exponential distribution. This function is multiplied by a sinusoid of unitamplitude and phase uniformly distributed over (0, 2rr).a) Classify tile product as continuous, discrete, or mixed.b) Classify the product after it has passed through an ideal hard limiterhaving an input-output characteristic given byVout = sgn (1in)c) Gla:;sify the product assuming the sinusoid is passed through a h_alfwave rectifier before multiplying the exponentially distributed time function and the sinusoid. .Answers: Mixed, continuous, discrete
- 205. 194 CHAPTER 5 · RANDOM PROCESSES5-3 Deterministic and Nondeterministic Random ProcessesIn mostofthediscussion sofar, ithas been impliedthateach sample functionis arandomfunctionoftime and, as such, its future values cannot be exactly predictedfrom the observed past values.Such a random process is said to be nondeterministic.Almost all natural random processes arenondeterministic; because the basic mechanism that generates them is .either unobservable orextremely complex. All the examples presented in Section 5-2 are nondeterministic.It is possible, however, to define random processes for which thefuture values ofany samplefunction can be exactly predicted from a knowledge of the past values. Such a process is saidto be deterministic.As an example, consider a random process for which each sample functionof the process is of the formX(t) = A cos (wt +8) (5-1)whereA and w are constants and 8 is a random variable with a specified probability distribution.That is, for any one sample function, (} has the same value for all tbut different values for theother members of the ensemble. In this case, the only random variation is over the ensemblenot with respect to time. It is still possible to define random variables X(ti), X(t2), and so on,and to determine probability density functions for them.As a second example of a deterministic process, consider a periodic random process havingsample functions ofthe form00X(t)= L[An cos (2nnfot) + Bn sin (2nnfot)] (5-2)n=Oin which the An andthe Bn are independent random variables thatare fixed for any one samplefunction but are different from sample function to sample function. Given the past history ofany sample function, one can determine these coefficients and predict exactly all future valuesof X(t).Itis not necessary that deterministic processes beperiodic, although this is probably the mostcommon situation that arises in practical applications. For example, a deterministic randomprocess might have sample functions of the formX(t)= A exp (-fit) (5-3)in which A and fi are random variables that are fixed for any one sample function but vary fromsample function to sample function.Although the concept of deterministic random processes may seem a little artificial, it oftenis convenient to obtain a probability model for signals that are known except for one or twoparameters. The process described by (5-1), for example, may be suitable to represent a radiosignal in which the magnitude and frequency are known, but the phase is not because the precisedistance (within a fraction of a wavelength) between transmitter and receiver is not.
- 206. 5-4 STATIONARY AND NONSTATIONARY RANDOM PROCESSES 195Exercise 5-3.1A sample function of the random process described by equation (5-3) isobserved to have the following values: X(1 ) = 1 .21 306 and X(2) = 0.73576.a) Find the values of A and {3.b) Find the value X(3.21 89).Answers: 0.4, 0.5, 2.0Exercise 5-3.2A random process has sample functions of the form00X(t)= L Anf(t-nt1)n=-oowhere the An are independent random variables that are uniformly distributedfrom 0 to 1 0, andf(t)= 1 0 ::::t::::(1/2)t1= 0 elsewherea) Is this process deterministic or nondeterministic? Why?b) Is this process continuous, discrete, or mixed? Why?Answers: Nondeterministic, mixed5-4 Stationary and Nonstationary Random ProcessesIt has been noted that one can define a probability density function for random variables of theform X(t1), b•1t so far no mention has been made of the dependence of this density functionon the value of time t1•If all marginal andjoint density functions of the process do not dependupon the choice of time origin, the process is said to be stationary.In this case, all of the meanvalues and moments discussed previously are constants that do not depend upon the absolutevalue of time.Ifany oftheprobability density functions do change with the choice oftime origin, theprocessis nonstationary.In this case, one or more ofthe mean values or moments will also depend on
- 207. 196 CHAPTER 5 · RANDOM PROCESSEStime; Since the analysis ofsystems responding to nonstationary random inputs is more involvedthan in the stationary case, all future discussions are limited to the stationary case unless it isspecifically stated to the contrary.In a rigorous sense, there are no stationary random processes that actually .exist physically,since any process must have started at some finite time in the past and must presumably stop atsome finite time in the future. However, there are many physical situations in which the processdoes not change appreciably during the time it is being observed. In these cases the stationaryassumption leads to a convenient mathematical model, which closely approximates reality.Determining whether or not the stationary assumption is reasonable for any given situationmay not be easy. For nondeterministic processes, it depends upon the mechanism ofgeneratingthe process and upon the time duration over which the process is observed. As a rule of thumb,it is customary to assume stationarity, unless there is some obvious change in the source orunless common sense dictates otherwise. For example, the thermal noise generated by therandom motion ofelectrons in a resistor mightreasonably be considered stationary under normalconditions. However, ifthis resistor were being intermittently heated by a current through it, thestationary assumption is obviously false. As another example, it might be reasonable to assumethat random wind veloCity comes from a stationary source over a period of 1hour, say, butcommon sense indicates that applying this same assumption to a period of 1week might beunreasonable.Deterministic processes are usually stationary only under certain very special conditions.It is customary to assume that these conditions exist, but one must be aware that this is adeliberate choice and not necessarily a natural occurrence. For example, in the case of therandom process defined by (5-1 ), the reader may easily show (by calculating the mean value)that the process may be (and, in fact, is) stationary when e is uniformly distributed over arange from 0 to 2rr , but that it is definitely not stationary when e is uniformly distributed overa range from 0 to rr . The random process defined by (5-2) can be shown to be stationary ifthe An and the Bn are independent, zero mean, Gaussian random variables, with coefficientsof the same index having equal variances. Under most other situations, however, this randomprocess will be nonstationary. The random process defined by (5-3) is nonstationary under allcircumstances.The requirement that all marginal and joint density functions be in�ependent of the choiceof time origin is frequently more stringent than is necessary for systems analysis. A morerelaxed requirement, which is often adequate, is that the mean value of any random variable, X(t1),is independent of the choice of t1.andthat the correlation of two random variables, X(t1)X(t2), depends only upon the time difference, t2 - t1• Processes that satisfythese two conditions are said to be stationaryinthewidesense. This wide-sense stationarity is adequate to guarantee that the mean value, mean-square value, variance, and correlation coefficient of any pair of random variables are constants independent of the choice oftime origin.In subsequent discussions of the response of systems to random inputs it will be found thatthe evaluation of this response is made much easier when the processes may be assumed eitherstrictly stationary or statior;aryin the wide sense. since the results are identical for either typeof stationarity, it is not necessary to distinguish between the two in any future discussion.
- 208. 5-5 ERGODIC AND NON ERGODIC RAN DOM. PROCESSES 197Exercise 5-4.1a) For the random process described in Exercise 5-3.2, find the meanvalue of the random variable X(t1/4).b) Find the mean value of the random variable X(3t1/4).c) Is the process stationary? Why?Answers: No, 5, OExercise 5-4.2A random process is described byX(t)= A + B cos (wt+8)where A is a random variable that is uniformly distributed between -3 and+3, B is a random variable with zero mean and variance of 4, wis a constant,and 8 is a random variable that is uniformly distributed from -JT/2 to +3Jr/2.A·, B, and 8 are statistically independent. Calculate the mean and varianceof this process. Is the process stationary in the wide sense?Answers: 5, wide sense stationary5-5 Ergodic and Nonergodic Random ProcessesSome stationary random processes possess the property that almost every memberl of theensemble exhibits the same statistical behavior as the whole ensemble. Thus, it is possibleto determine this statistical behavior by examining only one typical sample function. Suchprocesses are said to be ergodic.For ergodic processes, the mean values and moments can be determined by time averages aswell as by ensemble averages. Thus, for example, the nth moment is given by_1"° 1JTxn = xnf(x)dx = lim - xn(t)dt_ 00 T4oo 2T - T(5-4)Itshouldbeemphasized, however, thatthis conditioncannot existunlesstheprocess is stationary.Thus, ergodic processes are also stationary processes.
- 209. 198 CHAPTER 5 · RANDOM PROCESSESA process that does not possess the property of (5-4) is nonergodic. All nonstationaryprocesses are nonergodic, but it is also possible for stationary processes to be nonergodic.For example, consider sample functions of the formX(t) = Y cos (wt +8) (5-5)where w is a constant, Yis a random variable (with respect to the ensemble), and fJ is a randomvariable that is uniformly distributed over 0 to 2](,with fJ and Ybeing statistically independent.This process can be shown to be stationary -but nonergodic, since Y is a constant in any onesample function but is different for different sample functions.It is generally difficult, if not impossible, to prove that ergodicity is a reasonable assumptionfor any physical process, since only one sample function of the process can be observed.Nevertheless, it is customary to assume ergodicity unless there are compelling physical reasonsfor not doing so.Exercise 5-5.1State whether each of the following processes is ergodic or nonergodic andwhy.a) A random process in which the random variable is the number of carsper minute passing a traffic counter.b) The thermal noise generated by a resistor.c) The random process that.results when a Gaussian random process ispassed through an ideal half-wave rectifier.d) A random process having sample functions of the formX(t) = A + B cos (wt +8)where A is a constant, B is a random variable uniformly distributed from0 to oo, and fJ is a random variable that is uniformly distributed betweenO a� �..Answers: Ergodic, nonergodic (nonstationary), ergodic, ergodic1The term "almost every member" implies that a set of sample functions having total probabil�ty of zeromay not exhibit the same behavior as the rest of the ensemble. But having zero probability does not meanthat such a sample function is impossible.
- 210. 5-6 MEAS U REMENT OF PROCESS PARAMETERSExercise 5-5.2A random process has sample functions of the formX(t) = A cos (wt + fJ)where A is a random variable having a magnitude of +1 or -1 with equalprobability and e is a random variable uniformly distributed between Oand 21l .a) Is X(t) a wide sense stationary process?b) Is X(t) an ergodic process?Answers: Yes, no5-6 Measurement of Process Parameters199The statistical parameters of a random process are the sets of statistical parameters (such asmean, mean-square, and variance) associated with the X(t) random variables at various times t.In the case of a stationary process these parameters are the same for all such random variables,and, hence, it is customary to consider only one set ofparameters., A problem·of 9onsiden�blc; practical importance is that of estimating the process parametersfromlhe obsef.ations.qf·a single sample function (since one sample function of finite length isallthat is ever available). Because there is only one samplefunction, it is notpossible to make anensemble average in order to obtain estimates ofthe parameters. The only alternative, therefore,is to make a time average. If the process is ergodic, this is a reasonable approach because atime average (over infinite time) is equivalent to an ensemble average, as indicated by (5-4).Ofcourse, in most practical situations, we cannot prove that the process is ergodic and it is usuallynecessary to assume that it is ergodic unless there is some clear physical reason why it shouldnot be. Furthermore, it is not possible to take a time average over an infinite time interval, anda time average over it finite time interval will always be just an approximation to the true value.The following discussion is aimed at determining how good this approximation is, and uponwhat aspects of the measurement the goodness of the approximation depends.Consider first the problem ofestimAating the mean value ofan ergodic random process {x (t)}.This estimate will be designated as X and will be computed from a finite time average. Thus,for an arbitrary member ofthe ensemble, letA } 1TX = - X(t)dtT o(5--6)It should be noted that although X is a single number in any one experiment, it is also a randomvariable, since a different number would be obtained if a different time interval were used or if
- 211. 200 CHAPTER 5 · RANDOM PROCESSESa different sample function had been observed. Thus, X will notbe identically equal to the truemean value X, but if the measurement is to be useful it should be close to this value. Just howclose it isAlikely to be is discussed below. ASince Xis a random variable.... it has a mean value and a variance. If X is to be a good estimateof X, then the mean valuF of X should be equal to X and the variance should be small. From(5-6) the mean value of X isA[l {T ] 1 {TE[X] = E T loX(t)dt = T loE[X(t)]dt1 ( _ l [-1TJ _= T loXdt = TXt 0 = X(5-7)The interchange of expectation and integration is permissible in this case and represents acommon type of operation. The conditions where spch interchanges are possible is discussedinmoredetailinCtiapter 8. Itis clearfrom (5-7) thatXhas the proper mean value. Theevaluationofthe variance of Xis considerably more involved and requires a knowledge ofautocorrelationfunctions, a topic that is considered in the next chapter. However, the variance ofsuch estimatesis considered for the following discrete time case. It is sufficient to note here that the varianceturns outto be proportional to 1 /T . Thus, a better estimate ofthe mean is found by averaging thesample function over a longer time interval. As T approaches infinity, the variance approacheszero and the estimate becomes equal with probability one to the i:rue mean, as it must for anergodic process.As a practical matter, the integration requir�d by (5-6) can seldombe carried out analyticallybecause X(t) cannot be expressed in an explicitmathematicalform. Thealternative is toperformnumerical integration upon samples of X(t) observed at equally spaced time instants. Thus, ifX1 = X(�t), X2 = X(2M), . . . , XN = X(Ndi), then the estimate of X may be expressed asA 1 NX = N LX; (5-8)i = lThis is the discr�te time counterpart of (5-6).The estimate X is still a random variable andhas an expected value ofA [} N ] 1 NE[X] = E N �X; = N �E[X;](5-9)Hence, the estimate still has th� proper mean value.To evaluate the variance of X it is assumed that the observed samples are spaced far enoughapart in time so that they are statistically independent. This assumption is made for convenience
- 212. 5-6 MEASU REMENT OF PROCESS PARAMETERS 201at this point; a more generalderivation can be made after considering the material in Chapter 6.The mean-square value of Xcan be expressed asA[ 1 N N ] 1 N NE[cx)z] = E N2 ttEX;Xj =Nz ttEE[X;Xj] (5-10)where the double summation comes from the product of two summations. Since the samplevalues have been assumed to be statistically independent, it foHows thatThus,E[X;Xj] = X2 i= j= (X)2 i=/= j(5-1 1 )This results from the fact that the double summation of (5-10)contains N2terms all together,but only N of these correspond to i= j. Equation (5-11)can be written asE[(X)2] = �X2+(1- �) (X)2= _!_cri .f- (X)2NThe variance of X can now be written asVar(X) = E[(X)2J - {E[X]r = �cri + (X)2-(X)21 2= -CTxN(5-1 2)(5-1 3)This result indicates that the variance of the estimate of the mean value is simply I/N timesthe variance ofthe process. Thus, the quality of the estimate can be made better by averaging alarger number of samples.As an illustration of the above result, suppose it is desired to estimate the varianceof a zeromean Gaussian random process by passing it through a square law device and estimating themeanvalueof the output. Suppose it is also desired to find the number of sample values thatmust be averaged in order to be assured that the standard deviation of the resulting estimate isles& than 10%of the true mean value.Let the observed sample function of the zero-mean Gaussian process be Y(t) and have avariance of cri- After this sample function is squared, it is designated as X(t). Thus,X(t) = Y2(t)
- 213. 202 CHAPTER 5 • RAN DOM PROCESSESFrom (2-27)it follows thatHence,X = E[Y2] = uiX2 = E[Y4] = 3ui2 x2 (X)2 3 4 4 2 4Ux = - = Uy - Uy = UyIt is dear from this, that an estimate of X is also an estimate of u;. Furthermore the variance ofthe estimate of X must be 0.01(X)2 = O.Olui to meet the requirement of an error of less than10%.From (5-13)Thus, N = 200 statistically independent samples are required to achieve the desired accuracy.The preceding not only illustrates the problems in estimating the mean value of a randomprocess, but also indicates how the variance of a zero-mean process might be estimated. Thesame general procedures can obviously be extended to estimate the variance ofa nonzero-meanrandom process.Whentheprocess whosevarianceistobeestimatedhasanunknownmeanvalue, theprocedurefor estimating the variance becomes a little more involved. At first thought, it would seem thatthe logical thing to do is to find the average of the Xf and then subtract out the square of theestimated mean as given by equation (5-8). It turns out, however, that the resulting estimateof the variance is biased-that is, the mean value of the estimate is not the true variance. Thisresult occurs because the true mean is unknown. It is possible, however, to correct for this lackofknowledge by defining the estimate of the variance asNo-2 -1 ° xz -N (X)2x - (N - 1) � i (N - 1)• = I(5-14)It is left as an exercise for the student to show that the mean value of this estimate is indeedthe true variance. The student should also compare this result with a similar result shown inequation (4-8) of the preceding chapter.Exercise 5-6.1Using a random number generator obtain 1 00 random numbers uniformlydistributed between O and 1 0. Using numerical methodsa) estimate the mean
- 214. 5-7 SMOOTH ING DATA WITH A MOVING WINDOW AVERAGE Z03b) estimate the variance of the processc) estimate the standard deviation of the estimate of the mean.Answers: Using MATLAB RAND function with a seed of 0: S.1 588, 8.333,0.2887Exercise 5-6.2Show that the estimate of the variance given by equation (5-1 4) is anunbiased estimate. That is,E[&,i] = a,i5-7 Smoothing Data with a Moving Window AverageThe previous section discussed methods for estimating the mean and· variance of a stationaryprocess. In such cases it is always possible to increase the quality of the estimate by averagingover more samples. Practically, however, we are often faced .with a situation in which the meanvalue of the process varies slowly with time and our concern is with extracting this variationfrom the noise that is obscuring it. Even if the noise is stationary, the mean value is not. Forexample, we may be interested in observing how the temperature ofan electronic device changeswith time after it is turned on, or in determining if an intermittent signal is present or not. Insuch cases, increasing the number of samples averaged may completely hide the vil.riation weare trying to observe.The above is the classic problem of extracting a low-frequency signal from noise havinga bandwidth considerably greater than that of the signal. When both the signal and noise arecontinuous time functions, the estimation of the signal is usually accomplished by means ofa low-pass filter. One limitation of physically realizable filters is that they cannot respond tofuture inputs. Such filters are considered in more detail in subsequent chapters. However, whenthe signal plus noise is sampled and the samples stored, it is possible to make an estimate ofthemean value at any given time by using samples taken both before and after this time. There aremany ways of doing this1 but perhaps the simplest (but not necessarily the best) is the movingwind<!w average.Let the signal be represented by a set of samples Xi and the added noise by samples Ni. Theobserved data are Y; = X; + N;. An estimate of X; can be obtained from the moving windowaverage defined as(5-15)
- 215. 204 CHAPTER 5 • RANDOM PROCESSESwhere nL is the number of sample points beforethe point at which the estimate is to be madeand nR is the number of sample points afterthe desired point. Hence, the size of the windowover which the data are averaged is nL + nR + 1.Fromthe previous discussion on estimating themean value ofa random process, it is clear that making the window longer will yield a smootherestimate, but will also smooth the variations in X; that one wishes to observe. Obtaining theproper size for the window is largely a matter of trial and error because it depends upon theparticular data that are available.As an example ofthe moving window average, suppose there is an observed sample functionin which the mean value (i.e., the signal) increases linearly over a few sample point as shownby the solid line in Figure 5-4. Because of noise added to the signal, the observed samplevalues are quite dispersed, as indicated by the crosses in this figure. The resulting outputs frommoving window averages having two different window sizes are also displayed. It is clear thatthe larger window produces a smoother result, but that it also does not follow the true mean.value as closely.The moving window average usually produces good results when the mean value of theobserved sample function is not changing rapidly, particularly if it is changing linearly. It doesnotproduce g9od results ifthe mean value ofthe observed sample function has sharp peaks or isoscillatory. Thereare other techniques beyond the scope of our present discussion that do muchbetter in these cases.2.0- True Signal1 .5x ObsCived Data0 Window Size = 21� 1.0Q• Window Size a 41...::I.9::I0 0.5x"O=<...::I§ 0.0.....x-0.5-1.0 ........................................a..L.&.l.&...................................a..L.&.1.&................a..1.1.......&.L.1.......&.L.1..&..1..&.J0 10 20 30 40 50 60 70Sample Point Numberfigure 5-4 Smoothing produced by two different window sizes.80 90 100 1 10 120
- 216. PROBLEMS 205PROBLEMS -------------------5-1 . 1 A sample function from a random process is generated by rolling a die five times.During the interval from i- l to ithe value of the sample function is equal to theoutcome of the ith roll of the die.a) Sketch theresulting sample function ifthe outcomes ofthe five rolls are 5,2, 6,4,I.b) How many different sample functions does the ensemble of this random processcontain?c) What is the probability that the particular sample function observed in part (a) willoccur?d) What is the probability that the sample function consisting entirely of threes willoccur?5-1 .2 The random number generator in a computer generates three-digit numbers that areuniformly distributed between 0.000and 0.999at a rate of one random number persecond starting at t = 0.A sample function from a random process is generated bysumming the 10most recent random numbers and assigning this sum as the value ofthesample function during each 1second time interval. The sample functions are denotedas X (t) for t 2:: b.a) Find the mean value ofthe random variable X (4.5).b) Find the mean value ofthe random variable X (9.5).c) Find the mean value of the random variable X (20.5).5-2.1 Classify each ofthe following random processes as continuous,discrete,or mixed.a) A random process in which the random variable is the number of cars per minutepassing a given traffic counter.b) The thermal noise voltage generated by a resistor.c) The random process defined in Problem 5-1.2.d) The random process that results when a Gaussian random process is passed throughan ideal half-wave rectifier.e) The random process that results when a Gaussian random process is passed throughan ideal full-wave rectifier.
- 217. 206 CHAPTER 5 · RANDOM PROCESSESt) A random process having sampl� functions of the formX (t) = A cos (Bt + 8)where A is a constant, B is a random variaple that is exponentially distributed from 0to oo, and e is a random variable that is uniformly distributed between 0 and 2rr,5-2.2 A Gaussian random process having a mean value of 2 and a variance of 4 is passedthrough an ideal half-wave rectifier.a) Let Xp (t) represent the random process at the output of the half-wave rectifier ifthe positive portions of the input appear in the output. Determine the probabilitydensity function of Xp (t).b) Let X11 (t) represent the random process at the output of the half-wave rectifier ifthe negative portions of the input appear in the output. Determine the probabilitydensity function of X11 (t).c) Determine the probability density function of Xp (t)X11 (t).5-3. 1 State whethereach ofthe randomprocesses describedinProblem5-2.lis deterministicor nondeterministic.5-3.2 Sample functions from a deterministic random process are described byX (t) = At + B t ==: O= 0 t < 0where A is a Gaussian random variable with zero mean and a variance of 9and B is arandom variable that is uniformly distributed between 0 and 6.Aand Bare statisticallyindependent.a) Find the mean value of this process.b) Find the variance ofthis process.c) If a particular sample function is found to have a value of 10at t = 2 and a valueof 20 at t = 4,find the value of the sample function at t = 8.5-4.1 State whether each of the random processes described in Problem 5-2. 1 may reasonably be considered to be stationary or nonstatjonary. If you describe a process asnonstationary, state the reason for this claim.5-4.2 A random process has sample functions ofthe form
- 218. PROBLEMS 207X(t) = A cos (wt + 8)where A and w are constants and (} is a random variable.a) Prove that the process is stationary in the wide sense if (} is uniformly distributedbetween 0 and 21l.b) Prove that this process is nonstationary ife is not uniformly distributed over a rangeof that is a multiple of 21l.5-5. 1 A random process has sample functions of the formX(t) = Awhere A is a Rayleigh distributed random variable with mean of 4.a) Is this process wide sense stationary?b) Is this process ergodic?5-5.2 Statewhethereachoftheprocesses describedinProblem5-4.2isergodicornonergodicand give reasons for your decision.5-5.3 A random process has sample functions ofthe form00X(t) = L Af(t - nT - to)n=-oowhere A and T are constants and to is a random variable that is uniformly distributedbetween 0 and T. The function f(t) is defined byand zero elsewhere.a) Find X and x2.f(t) = 1 0 S t S T/2b) Find < X > and < X2 > where < > implies a time average.c) Can this process be stationary?d) Can this process be ergodic?
- 219. 208 CHAPTER 5 · RANDOM PROCESSES5-6.1 A stationary random process is sampled at time instants separated by 0.01 seconds.The resulting sample values are tabulated below.x(i) . x(i) x(i)0 0.19 7 -1.24 14 1.451 0.29 8 -1.88 15 -0.822 1.44 9 -0.31 16 -0.253 0.83 10 1.18 17 0.234 - 0.01 11 1.70 18 - 0.915 -1.23 12 0.57 19 -0.19- 6 -1.47 13 0.95 20 0.24a) Estimate themean value of this process.b) If the process has a true variance of 1.0, find the variance of your estimate of themean.5-6.2 Estimate the variance of the process in Problem 5-6.1.5-6.3 Using a random number generator generate 200 random numbers having a Gaussiandistribution with mean of 10 and standard deviation of 5. From these numbersa) estimate the mean of the processb) estimate the variance of the processc) estimate the standard deviation ofthe estimate ofthe meand) compare the estimates with the theoretical values.ReferencesSee referencesfor Chapter J, particularly Davenport and Root, Gardner; Papoulis, and Helstrom.
- 220. CHAPTER()--------------------Correlation Functions6-1 IntroductionThe subject of correlation between two random variables was introduced in Section 3-4. Nowthat the concept of a random process has also been introduced, it is possible to relate these twosubjects to provide a statistical (rather than a probabilistic) description of random processes.Although a probabilistic description is the most complete one, since it incorporates all theknowledge that is available about a random process, there are many engineering situations inwhich this degree of completeness is neither needed nor possible. If the major interest in arandom quantity is in its average power, or the way in which that power is distributed withfrequency, then the entire probability model is not needed. If the probability distributions of therandom quantities are not known, use ofthe probability model is not even possible. In either case,a partial statistical description, in terms of certain average values, may provide an acceptablesubstitute for the probability description.It was noted in Seetion 3-4thatthe correlationbetween two random variables was the expectedvalue oftheirproduct. If the two random variables are defined as samples of a random process attwo different time instants, then this expected value depends upon how rapidly the time functionscan change. We would expectthat the random variables would be highly correlated when the twotime instants are very close together, because the time function cannot change rapidly enough tobe greatly different. On the other hand, we would expect to find very little correlation betweenthe values of the random variables when the two time instants are widely separated, becausealmost any change can take place. Because the correlation does depend upon how rapidly thevalues of the random variable can change with respect to time, we expect that this correlationmay also be related to the manner in which the energy content of a random process is distributedwith respect to frequency. This is because a time function must have appreciable energy at highfrequencies in.order to be able to change rapidly with time. This aspect of random processes isdiscussed in more detail in subsequent chapters.209
- 221. 210 CHAPTER 6 · CORRELATION FU NCTIONSThe previously defined correlation was simply a number since the random variables were notnecessarily defined asbeing associatedwith timefunctions. In the following case, however, everypair cifrandom variables can be related by the time separation between them, and the correlationwill be a function ofthis separation. Thus, it becomes appropriate to define acorrelationfunction"in which the argument is· the time separation of the two random variables. If the two randomvariables come fromthe same randomprocess, this function will beknown as theautocorrelationfanction. If they come from different random processes, it will be called the crosscorrelationfunction.We will consider autocorrelation functions first.If X(t) is a sample function from a random process, and the random variables are definedto beX1 = X(t1)X2 = X(ti)then the autocorrelation function is defined to be(6-1)This definition is valid for both stationary and nonstationary random processes. However, ourinterestis primarily in stationaryprocesses, forwhichfurther simplification of(6-1)ispossible. Itmayberecalledfromtheprevious chapterthatforawide-sense stationary process all such ensemble averages areindependentofthe time origin. Accordingly, fora wide-sense stationaryprocess,Rx(t1, ti) = Rx(t1 + T; ti + T)= E[X(t1 + T)X(ti + T)]Since this expression is independent of the choice of time origin, we can set T = -t1 to giveRx(t1, ti) = Rx(O, ti - t1) = E[X(O)X(ti - ti)]It is seen ·that this expression depends only on the time difference t2 - ti. Setting this timedifference equal to r = ti - t1 and suppressing the zero in the argument of Rx(O, ti - ti), we.can rewrite (6-1) asRx(r) = E[X(t1)X(t1 + r)] (6-2)This is the expression for the autocorrelation function of a stationary process and depends onlyon r and not on the value of t1. Because of this lack of dependence on the particular time t1at which the ensemble averages are taken, it is common practice to write (6-2) without thesubscript; thus,Rx(r) = E[X(t)X(t + r)]
- 222. 6- 1 I NT RO D U CTION 2 1 1Whenever correlation functions relate to nonstationary processes, since they are dependent on theparticular time at which the ensemble average is taken as well as on the time difference betweensamples, they must be written as Rx Cti . t2) or Rx (t1 , r). In all cases in this and subsequentchapters, unless specifically stated otherwise, it is assumed that all correlation functions relateto wide-sense stationary random processes.It is also possible to define a timeautocorrelationfunction for a particular sample function asl1 1Tffi-x (r) = lim2Tx (t)x(t+ r) dt = (x (t)x(t + r))T-+oo -T(&-3)For the special case of an ergodicprocess, ((x(t)x (t + r)) is the same for every x (t) and equalto Rx (r). That is,for an ergodic process (6-4)The assumption of ergodicity, where it is not obviously invalid, often simplifies the computationof correlation functions.From (-0-2) it is seen readily that for r = O, since Rx (O) = E[X (t1)X(t1 )]; the autocorrelationfunction is equal to the mean-square value of the process. For values of r other than r = 0, theautocorrelation function Rx (r) can be thought of as a measure of the similarity ofthe waveformX(t) and the waveform X (t + r). To illustrate this point further, let X (t) be a sample functionfrom a zero-mean stationary random process and form the new functionY(t) = X (t) - pX (t + r)By determining the value of p that minimizes the mean-square value of Y(t) we will havea measure of how much of the waveform X (t + r) is contained in the waveform X(t). Thedetermination of p is made by computing the variance of Y(t), setting the derivative of thevariance with respect to p equal to zero, and solving for p. The operations are as follows:E{[Y(t)]2} = E{[X(t) - pX (t + r)]2}= E{X2(t) - 2pX(t)X(t + r) + p2X2(t + r)}u; = ui - 2pRx (r) + p2uidu2_Y = -2Rx (r) +2pui = 0dpRx (r)p = --2.-0x(&-5)It is seen from (&:-5) that p is directly related to Rx (r) and is exactly the correlationcoefficientdefined in Section 3-4. The coefficient p can be thought of as the fraction of the waveshape1 The symbol ( ) is used .to denote time averaging.
- 223. 212 CHAPTER 6 · CORRELATION FUNCTI.ONSof X(t) remaining after r seconds has elapsed. It must be remembered that p was calculatedon a statistical basis; and that it is the average retention of waveshape over the ensemble, andnot this property in any particular sample function, that is important. As shown previously, thecorrelation coefficient p can vary from +1 to -1. For a value of p = 1, the waveshapes wouldbe identical-that is, completely correlated. For p = 0, the waveforms would be completelyuncorrelated; that is, no part ofthe waveform X(t+r)wouldbe contained in X(t).For p = -1,the waveshapes would be identical, except for opposite signs; that is, the waveform X(t + r)would be the negative of X(t).For an ergodic process orfor nonrandom signals, the foregoing interpretation can be made interms ofaveragepower instead ofvariance and in terms ofthe time correlation function insteadof the ensemble correlation function.Since Rx(r) is dependentboth on the amountofcorrelation p and the variance ofthe process,ai, it is not possible to estimate the significance of some particular value of Rx(r) withoutknowing one or the other of these quantities. For example, if the random process has a zeromean and the autocorrelation function has a positive value, the most that can be said is thatthe random variables X(t1}and X(t1 + r) probably have the same sign.2 If the autocorrelationfunction has a negative value, it is likely that the random variables have opposite signs. If it isnearly zero, the random variables are about as likely to have opposite signs as they are to havethe same sign.·Exercise 6-1 .1A random process has sample functions of the formX(t) = A= 00 :::: t :::: 1elsewherewhere A is a random variable that is uniformly distributed from 0 to 1 0. Usingthe basic definition of the autocorrelation function as given by Equation (6-1 ), find the autocorrelation function of this process.Answer:Rx(t1, t2) = 33.3= 0 elsewhere2 This is strictly true only if f(x1 ) is symmetrical about the axis x1 = 0.
- 224. 6-2 EXAMPLE: AUTOCORRELATION FUNCTION OF A BINARY PROCESS 213Exercise 6-1 .2Define a random variable Z(t) asZ(t) = X(t) +X(t +r1)where X(t) is a sample function from a stationary random process whoseautocorrelation function isRx (r) =- exp(- r2)Write an expression for the autocorrelation function of the random processZ(t).Answer:6-2 Example: Autocorrelation Function of a Binary ProcessThe above ideas may be made somewhat clearerby considering, as a special example, arandomprocesshaving avery simple autocorrelationfunction. Figure 6-1shows atypical samplefunctionfrom a discrete, stationary; zero-mean random process in which only two values, ±A, arepossible. The sample function either can change from one value to the other every ta secondsor remain the same, with equal probability. The time t0 is a random variable with respect to theensemble of possible time functions and is uniformly distributed over an interval of length ta.This means, as far as the ensemble is concerned, that changes in value can occur at any timewith equal probability. It is also assumed that the value ofX(t) in any one interval is statisticallyindependent of its value in any other interval.Althoughtherandomprocess describedinthe aboveparagraphmay seemcontrived, itactuallyrepresents a very practical situation. In modem digital communication systems, the messagesto be conveyed are converted into binary symbols. This is doneby first sampling the message atperiodic time instants and then quantizing the samples into a finite number of amplitude levelsas discussed in Section 2-7 in connection with the uniform probability density function. Eachamplitude level is then represented by a block of binary symbols; for example, 256 amplitudelevels can each be uniquely represented by a block of 8 binary symbols. The binary symbolscan in tum be represented by a voltage level of +Aor _::A. Thus, a sequence of-binary symbolsbecomes awaveformofthetype showninFigure 6--1.Similarly, this waveformis typical ofthosefound in digital computers or in communication links connecting computers together. Hence,
- 225. 214 CHAPTER 6 • CORRELATION FUNCTIONSX(t)Atl Ito - ta toIta+ ta.:1X(t1) = X 1- ->- - AITFigure c>-1 A discrete, stationary sample function.Figure c>-2 Autocorrelation function of the processin Figure 6-1 .X.(tl + T) = x 2IIII fl + T It0+ 2t8: t0+ 3taIIIIIIt0+4tathe random process being considered here is not only one ofthe simplest ones to analyze, but isalso one ofthe most practical ones in the real world.The autocorrelation function ofthis process will be determined by heuristic arguments ratherthan by rigorous derivation. In the first place, when IrI is larger than ta. then t1 and t1 + r = t2cannot lie in the same interval, and X1 and X2 are statistically independent. Since X1 and X2have zero mean, the expected value of their product must be zero, as shown by (3-22); that is,l•I > tasince X1 = X2 = 0. When l•I is less than ta. then t1 and t1 + r may or may not be in thesame interval, depending upon the value of t0. Since to can be anywhere, with equal probability,the probability that they do lie in the same interval is proportional to the difference betweenta and r. In particular, for r =:::: 0, it is seen that to ::=:: t1 ::; 11 -i- r < to + ta. which yieldst1 + r -la < to ::; ti. Hence,Pr (t1 and t1 + r are in the same interval)= Pr [(t1 + r - ta < to :::: t1)]1 ta - r= �[ti - (t1 + r - ta)] = --·ta ta
- 226. 6-2 EXAMPLE: AUTOCORRELATION FU NCTION OF A B INARY PROCESS 215since the probability density function for toisjust 1/ta.When r <0,itis seen that t0 ::=: ti+ r ::=:ti < to + ta,which yields ti-ta <to�ti+r . Thus,Hence, in general,Pr (tiand ti+ rare in the same interval)=Pr[(ti-ta)<to::=: (ti+ r)]1 ta+r=-[ti+r -(ti-ta)]=--ta taP ( d. . 1 ta-IrIr tian ti+ r are m same mterva )=--taWhen they are in the same interval, the product of X1 and X2 is always A2; when they are not,the expected product is zero. Hence,Rx (r) =Az rta- l r l ]=Az [1-J!l]- ta ta (6-6)=0This function is sketched in Figure 6-2.It is interesting to consider the physical interpretation of this autocorrelation function io. lightof the previous discussion. Note that when l r l is small (less than ta).there is an increasedprobability that X(t1)and X(ti+ r) will have the same value, and the autocorrelation functionis positive. When l r l is greaterthan ta.it is equally probable that X(ti)and X(ti+ r) will havethe same value as that they will have opposite values, and the autocorrelation function is zero.For r = 0 the autocorrelation function yields the mean-square vall!e of A2.Exercise 6-2.1A speech waveform is sampled 4000 times a second and each sampleis quantized into 256 amplitude levels. The resulting amplitude levels arerepresented by a binary voltage having values of ±5. Assuming that successive binary symbols are statistically independent, write the autocorrelationfunction of the binary process.Answer:Rx (r) = 25[1-32,000l r l ]=010< l r l < --- - 32000elsewhere
- 227. 216 CHAPTER 6 • CORRELATION FUNCTIONSExercise 6-2.2x(t)0-AA sample function from a stationary random process is shown above. Thequantity t0 is a random variable that is uniformly distributed from O to ta andthe pulse amplitudes are ±A with equal-probability and are independent frompulse to pulse. Find the autocorrelation function of this process.Answer:Rx(r) =A2�[1 - Iii] lrl ::: b=0 l•I > b6-3 Properties of Autocorrelation FunctionsIfautocorrelation functions are toplay a useful role in representing randomprocesses and in theanalysis of systems with random inputs, it is necessary to be able to relate the properties oftheautocorrelation function to the properties of the random process it represents. In this section, anumber of the properties that are possessed by all autocorrelation functions of stationary andergodic _random processes·are summarized. The student should pay particular attention to theseproperties because they will come up many times in future discussions.1. Rx(0)=X2•Hence, the mean-square value ofthe random process can always be obtainedsimply by setting r =0. ·It should be emphasized that Rx (0)gives the mean-square value whether the process has anonzero mean value or not. If the process is zero mean, then the mean-square value is equal tothe var;iance of the process.2. Rx(r) =Rx(-r). The autocorrelation function is an even function of r.This is most easily seen, perhaps, by thinking ofthe time-averaged autocorrelation function,which is the same as the ensemble-averaged autocorrelation function.for an ergodic randomprocess. In this case, the time average is taken over exactly the same product functionregardless
- 228. 6-3 PROPERTIES OF AUTOCORRELATION FU NCTIONS 217of which direction one of the time functions is shifted. This symmetry property is extremelyuseful in deriving the autocorrelation function of a random process because it implies that thederivation needs to be carried out only for positive values of Tand the-result for negative Tdetermined by symmetry. Thus, in the derivation shown in the example in Section 6-2,it wouldhave been necessary to consider only the case for T 2: 0. For a nonstationary process, thesymmetry property does not necessarily apply.3. IRx(r)I :::: .Rx(O).The largest value of the autocorrelation function always occurs atT= 0. There may be other values of Tfor which it is just as bir (for example, see the periodiccase below), but it cannot be larger. This is shown easily.by consideringE[(X1±X2)2]= E[Xf +Xi ±2X1X2]2: 0E[Xf+ Xil= 2Rx(O)2: IE(2X1X2)I= l2Rx(r)Iand thus,Rx(O)2: IRx(r)I (6-7)4. If X(t)has a de component or mean value, tben Rx(r)will have a constant component.For example, if X(t)= A, then·Rx(r) = E[X(t1)X(t1+r)]= E[AA]= A2More generally, if X(t)has a mean value and a zero mean component N(t)so thatthensinceX(t)= X+N(t)Rx(r) = E{[X+N(t1)][X+N(t1+r)]}= E[(X)2+XN(t1)+ XN(t1+r)+N(t1)N(t1+r)]= (X)2 +RN(i)E[N(t1)]= E[N(t1+r)]= 0Thus, even in this case, Rx(T)contains a constant component.(6-8)(6-9)For ergodic processes the magnitude of the mean value of the process can be determinedby looking at the autocorrelation function as Tapproaches infinity, provided that any periodiccomponents in the autocorrelation function are ignored in the limit. Since only the squar! ofthe mean value is obtained from this calculation, it is not possible to determine the sign of themean value. If the process is stationary, but not ergodic, the value of Rx(T) may not yield any
- 229. 218 C H A PT E R 6 · CORRE LAT I O N F U N CTIONSinformation regarding the mean value. For example, a random process having sample functionsof the formX(t) = Awhere A is a random variable with zero mean and variance al, has an autocotrelation function ofRx(r:) = alfor all r:. Thus, the autocorrelation function does not vanish at r: = oo even though the processhas zero mean. This strange result is a consequence of the process being nonergodic and wouldnot occur for an ergodic process.5. If X(t) has a periodic component, then Rx(r:) will also have a periodic component, withthe same period. For example, let·X(t) = A cos (wt + 0)where A and w are constants and 0 is a random variable uniformly distributed over a range of21r. That is,Thenf(O) =2�= 0 elsewhereRx(r:) = E[A cos (wt1 + O)A cos (wt1 + wr: + O)][A2 A2 ]= E z cos (2wt1 + wr: + 20) + z cos wr:A2 {21r 1=T Jo 21r[cos (2wt1 + wr: + 20) + cos wr:] dOA2= - cos (J)f2In the more general case, in whichX(t) = A cos (wt + 0) + N(t)(6-10)where 0 and N(t1) are statistically independent for all t1, by the method used in obtaining (5-9),it is easy to show that
- 230. 6-3 PROPERTIES OF AUTOCORRELATION FUNCT!ONS 219AzRx(r) ==T cos wr + RN(r) (6-1 1 )Hence, the autocorrelation function still contains a periodic component.The above property can be extended to consider random processes that contain any numberof periodic !:.Omponents. If the random variables associated with the periodic components arestatisticallyindependent, then the autocorrelationfunction ofthe sum oftheperiodic componentsis simply the sum of the periodic autocorrelation functions of C.lch component. This statementis true regardless of whether the periodic components are harmonically related or not.Ifevery sample function ofthe random process is periodic and can be represented by a Fourierseries, the resulting autocorrelation is also periodic and can also be represented by a Fourierseries. However, this Fourier series will include more than just the sum of the autocorrelationfunctions ofthe individual terms ifthe random variables �ssociated with the various componentsofthe samplefunctionarenotstatistically independent. Acommon situation in whichtherandomvariables are not independent is the case in which there is only one random variable for theprocess, namely a random delay on each sample fu!J:ction that is uniformly distributed over thefundamental period.6. If {X(t)} is ergodic and zero mean, and has no periodic components, thenJim Rx(r) = 0I T l -+oo(6-12)For large values of r, since the effect of past values tends to die out as time progresses, therandom variables tend to become statistically independent.7. Autocorrelation functions cannot have an arbitrary shape. One way of specifying shapesthat are permissible is in terms of the Fourier transform of the autocorrelation function. Thatis, iffffe[Rx(r)] = 1_:Rx(r)e-jwTdrthen the restriction isfffe[Rx(r)] � 0 all w (6-13)The reason for this restriction will become apparent after the discussion of spectral densityin Chapter 7. Among other things, this restriction precludes the existence of autocorrelationfunctions with flat tops, vertical sides, or any discontinuity in amplitude. .There is one further point that should be emphasized in connection with autocorrelationfunctions. Although aknowledge ofthejoint probability density functfons ofthe randomprocessis sufficient to obtain a unique autocorrelation function, the converse is not true. There may bemany differentrandom processes that can yield the same autocorrelation function. Furthermore,as will be shown later, the effect of linear systems on the autocorrelation function of the inputcan be computed without knowing anything about the probability density functions. Hence, the
- 231. 220 CHAPTER 6 · CORRELATION FUNCTIONSspecification ofthe correlation function ofa random process is not equivalent to the specificationof the probability density functions and, in fact, represents a considerably smaller amount ofinformation.Exercise 6-3.1a) An ergodic random process has an autocorrelation function of the formRx(•) = 9e-4ITI + 16 cos 10-r: + 16Find the mean-square value, mean value, and variance of this process.b) An ergodic random process has an autocorrelation function of the form4r2 +6Rx(•) =r-2 + 1Find the mean-square value; mean value, and variance of this process.Answers: 2, 6, 41 , ±2, ±4, 33Exercise 6-3.2For each of the following functions of r, determine the largest value of theconstant A for which the function could be a valid autocorrelation function:a) e-41TI _ Ae-21TIb) e-IT+AIc) 10 cos (2r) - A cos (r)Answers: 0, 2, O6-4 Measurement of Autocorrelation FunctionsSince the autocorrelation function plays an important role in the analysis of linear systemswith random inputs, an important practical problem is that of determining these functions for
- 232. 6-4 MEASUREMENT OF AUTOCORRELATION FUNCTIONS 221 ·experimentally obser.ved random processes. In general, they cannot be calculated fi:om the jointdensity functions, since these density functions are seldom known. Nor can an ensemble averagebe made, because there is usually only one sample function from the ensemble available. Underthese circumstances, the only available procedure is to calculate a time autocorrelation functionfor a finite time interval, under the assumption that the process is ergodic.To illustrate this, assume that a particular voltage or current waveform x (t)has been observeuover a time interval from 0 to T seconds. It is then possible to define an estimatedcorrelationfunction as for this particular waveform asRx(r)= -- x(t)x(t+r)dt1 1T-rT-r 0. (6-14)Over the ensemble of sample functions, this estimate is a random variable denoted by Rx(r).Note that the averaging time is T -rrather than T because this is the only portion of theobserved data in which both x (t) and x(t+r)are available.In most practical cases it is not possible to carry out the integration called for in (6-14)because a mathematical expression for x (t) is not available. An alternative procedure is toapproximate the integral by sampling the continuous time funCtion .at discrete instants of timeand perfonning the discrete equivalent to (6-14). Thus, if the samples of a particular samplefunction are taken at time instants of 0, l:l.t,21:l.t,...,Nl:l.t,and if the corresponding values ofx(t)are xo , x1 , x2 , . . . , XN, the discrete equivalent to (6-14)isn= 0, 1, 2,...,M (6-15)M«NThis estimate is also a random variable over the ensemble and, as such, is denoted by Rx(nl:l.t).Since Nis quite large (on the order of several thousand), this operation is best performed !Jy adigital computer.To evaluate the quality of this estimate it is necessary to determine the mean and the varianGeof Rx(nllt), since it is a random variable whose precise value depends upon the particularsample function being used and the particular set of samples taken. The mean is easy toobtain since[ 1 N-n ]E[Rx(nl:l.t)=1 L XkXk+n·N-n+ k=OlN-nlN-n= " E[XkXk+n] = " Rx(nllt)N-n+lL... N-n+l L...k=O k=O= Rx(nl:l.t)
- 233. 222 CHAPTER 6 • CORRELATION FUNCTIONSThus, the expected value ofthe estimate is the true value ofthe autocorrelation function and thisis an unbiasedestimate of the autocorrelation function.Although the estimate described by (6-15) is unbiased, it is not necessarily the best estimatein the mean-square error sense and is not the form that is most commonly used. Instead it iscustomary to usen = 0, 1 , 2, . . . , M (6-16)This is a biased estimate, as can be seen readily from the e".aluation of E[Rx(nD.t)] given abovefor the estimate of (6-15). since only the factor by which the sum is divided is different in thepresent case, the expected value of this new estimate is simplyE[Rx(nD.t)] = [1 - _n_]Rx(nD.t)N + lNote that if N » n, the bias is small. Although this estimate is biased, in most cases, the totalmean-square error is slightly less than for the estimate of (6-15). Furthermore, (6-16) is slightlyeasier to calculate.It is much more difficult to determine the variance of the estimate, and the details of this arebeyond the scope of the present discussion. It is possible to show, however, that the variance ofthe estimate must be smaller than2MVar [Rx(nD.t)] ::::NL Ri:(kD.t)k=-M(6-17)This expression forthe variance assumes that the 2M + 1 estimated values ofthe autocorrelationfunction span the region in which the autocorrelation function has a significant amplitude. Ifthe value of (2M + l)At is too small, the variance given by (6-17) may be too small. Ifthe mathematical form of the autocorrelation function is known, or can be deduced from themeasurements that are made, a inore accurate measure of the variance of the estimate is2 100Var [Rx(nAI)] ::::T-ooRi:(r)d• (6-18)where T = NAt is the length of the observed sample.As an illustration of what this result means in terms of the number of samples required fora given degree of accuracy, suppose that it is desired to estimate a correlation function of theform shown in Figure 6-2 with four points on either side of center (M = 4). If an rms error of5%3 or less is required, then (6-17) implies that (since ta = 4D.t)3 This implies that the standard deviation of the estimate should be no greater than 5% of the true meanvalue of the random variable Rx(ntl.t).·
- 234. 6-4 MEASUREMENT OF AUTOCORRELATION FUNCTIONS 223(0.05A2)2 � � t A4 [1 - lkl Lit]2Nk=-44LitThis can be solved for N to obtainN � 2200It is clear that long samples ofdata and extensive calculations are necessary if accurate estimatesof correlation functions are to be made.The Students Edition ofMATLAB does not have a function for computing the autocorrelationfunction of a vector of data samples. However, there are several ways to readily accomplish thecalculation. The one considered here makes use of the convolution function and a methoddescribed in Chapter 7 makes use of the fast Fourier transform. The raw convolution of twovectors, a and b, of data leads to a new vector of data whose elements are of the formc(k) = La (j)b(k - j)jwhere the summation is taken over all values ofj for which x(j) and y(k-j) are valid elementsof the .vectors of data. The most widely used estimate of the autocorrelation function, i.e., thebiased estimate, has elements of the formIR(k) = -- � a(j)a(j - k)N + I �Jk = 0, 1, 2 . . . N - IThus the autocorrelation function can be computed by convolution of the data vector with areversed copy of itself and weighting the result with the factor I/(N+I). The following specialMATLAB function carries out this calculation.function [ndt,R] = corb(a,b,f}% corb.m biased correlation function% a, b are equal length sampled time functions% f is the sampling frequency% ndt is the lag value for :I: time delaysN=length(a);R=conv(a,fliplr(b))/(N+1 ); °locale of correlation functionndt=(-(N-1 ):N-1 )*1/f; °locale of lag valuesThis function calculates values of R(nLit) for -(N - 1) ::: n ::: (N - 1) for a total of 2N - Ielements. The maximum value occurs at R(N) corresponding to Rx(0) and the autocorrelationfunction is symmetrical about this point. As an example of the use of this function, it will be
- 235. 224 CHAPTER 6 • CORRELATION FUNCTIONSused to estimate the autocorrelation function of a sample of a Gaussian random process. TheMATLAB program is straightforward as follows.%corxmp1 .m example of autocorrelation calculationrand(seed,1 000}; % use seed to make repeatablex=1 O*randn(1 , 1 001 ); % generate random samplest1 =0:.001 : 1 ; % sampling index[t,R]=corb(x,x,1 000); % autocorrelationsubplot(2, 1 ,1 ); plot(t1 ,x);xlabel(TIME);ylabel(X)subplot(2, 1 ,2); plot(t,R);xlabel(LAG);ylabel(Rx)The resulting sample function and autocorrelation function are shown "in Figure 6-3. It is seenthat the autocorrelation function is essentially zero away from the origin where it is concentrated.This is characteristic of signals whose samples are uncorrelated, as they are in this case. Fromthe program, it is seen that the standard deviation of the random signal is 10and, therefore, thevariance is 100,corresponding to a lag of zero on the graph of the autocorrelation function.4020x 0-20-4001 501 00Rx 50 -0-50-10.2• I. .-0.5.0.4· -· · -TI MEI 0LAG0.6. J • •0.80.5Figure 6-3 Sample function and autocorrelation function of uncorrelated noise.1-1
- 236. 6-4 MEAS U REMENT OF AUTOCORRE LATION FUNCTIONS 225Consider now an example in which the samples are not uncorrelated. The data vector will beobtained from that used in the previous example by carrying out a running average of the datawith the average extending over 5 1 points. The program that carries out this calculation is asfollows.%corxmp2.m example 2 of autocorrelation calculationrand(seed, 1 000);x1 =1 O*randn(1 , 1 001 );h=(1/51 )*ones(1 ,51 );x2=conv(x1 ,h); %length of vector is 1 001 +51-1x=x2(25:25+1 000); %keep vector length at1 001t1 =0:.001 : 1 ; %sampling index[t,R]=corb(x,x, 1 000); %autocorrelationsubplot(2, 1 , 1 ); plot(t1 ,x);xlabel(TIME);ylabel(X)subplot(2, 1 ,2); plot(t,R);xlabel(LAG);ylabei(Rx)Figure 6-4 shows the resulting sample function and the autocorrelation function. It is seenthat there is considerably more correlation away from the origin and the mean-square value isreduced. Thereduction inmean-square valueoccursbecausethe convolution withtherectangularfunction is a type oflow-pass filtering operation that eliminates energy from the high-frequencycomponents in the waveform as can be seen in the upper part ofFigure 6-4.The standard deviation of the autocorrelation estimate in the example of Figure 6-4 can befound using (6-17). The MATLAB program for this is as follows.%corxmp3.m calc. of standard deviation of correlation estimateM = length(R);V = (2/M)*sum(R."2);S = sqrt(V)The result is S = 0.3637. It is evident that a much longer sample would be required if a highdegree of accuracy was desired.Exercise 6-4.1An ergodic random process has an autocorrelation function of the formRx (r) = -10e-21rla) Over what range of r-values must the autocorrelation function of thisprocess be estimated in order to include all values of Rx<.r) greater than1 % of the maximum.
- 237. 2262x 0-2CHAPTER 6 · CORRELATION FUNCTIONS-4 ------------------..._------------------------------------------------------------------�-o 0.2 0.4 0.6 0.8 1TIME1Rx 0.50-o.s -. �����--�����--�����....__����----1 -0.5 0LAGFigure 6-4 Autocorrelation function of partially correlated noise.0.5b) If 23 estimates (M = 22) of the autocorrelation function are to be madein the interval specified in (a), what should the sampling interval be?c) How many sample values of the random process are required so thatthe rms error of the estimate is less than 5% of the true maximum valueof the autocorrelation function?Answers: 0, 1 , 2.3, 4053Exercise 6-4.2Using the variance bounds given by the integral of (6-1 8), find the number ofsample points required for the autocorrelation function estimate of Exercise6-4.1 .Answer: 20001
- 238. 6-5 EXAMPLES OF AUTOCORRE LATION FUNCTIONS 2276-5 Examples of Autocorrelation FunctionsBefore going on to consider crosscorrelation functions, it is worthwhile to look at some typicalautocorrelation functions, suggest the circumstances under which they might arise, and listpossible applications. This discussion is not intended to be exhaustive, but is intended primarilyto introduce some ideas.The triangular correlation function shown in Figure 6-2is typical ofrandom binary signals inwhich the switching must occur at uniformly spaced time intervals. Such a signal arises in manytypes of communication and control systems in which the continuous signals are sampled atperiodic instants oftime and the resulting sample amplitudes converted to binary numbers. Thecorrelation function shown in Figure 6-2assumes that the random process has a mean value ofzero, but this is not always the case. If, for example, the random signal could assume valuesofA and 0 (rather than -A) then the process has a mean value of A/2 and a mean-squarevalue of A2/2. The resulting autocorrelation function, shown in Figure 6-5, follows from anapplication of (6-9).Notall binary timefunctions havetriangularautocorrelation functions, however. Forexample,anothercommontype ofbinary signal is one in which the switching occurs at randomly spacedinstants oftime. Ifall times are equally probable, then the probability density function associatedwith the duration of each interval is exponential, as shown in Section 2-7. The resultingautocorrelation function is also .exponential, as shown in Figure 6-6.The qsual mathematicalrepresentation of such an autocorrelation function is(6-19)where a is the average number of intervals per second.Binary signals and correlation functions of the type shown in Figure 6-6frequenHy arise inconnection with radioactive monitoring devices. The randomly occurring pulses at the outputof a particle detector are used to trigger a flip-flop circuit that generates the binary signal.This type of signal is a convenient one for measuring either the average time interval betweenparticles orthe average rate ofoccurrence. It is usuallyreferredto in the literature as theRandomTelegraph Wave.Figure 6-5 Autocorrelation function of abinary process with a nonzero mean value.
- 239. 228 CHAPTER 6 • CO RRELATION FUNCTIONSAX(t)- _..-- A(a) (b)Figure 6-6 (a) A binary signal with randomly spaced switching times and (b) the correspondingautocorrelation function.Nonbinary signals can also have exponential correlation functions. For example, if verywideband noise (having almost any probability density function) is passed through a lowpass RC filter, the signal appearing at the output of the filter will have a nearly exponentialautocorrelation function. This result is shown in detail in Chapter 8.Both the triangular autocorrelation function ·and the exponential autocorrelation functionshare one feature that is worth noting. That is, in both cases the autocorrelation function hasa discontinuous derivative at the origin. Random processes whose autocorrelation functionshave this property are said to be nondifferentiable.A nondiffereil.tiable process is one whosederivative has an infinite variance. For example, if a random voltage having an �xponentialautocorrelation function is applied to a capacitor, the resulting current is proportional to thederivative of the voltage, and this current would have an infinite variance. Since this does notmake sense on a physical basis, the implication is that random processes having truly triangularor truly exponential autocorrelation functions cannot exist in the real world. In spite of thisconclusion, which is indeed true, both the triangular and exponential autocorrelation functionsprovide useful models in many situations. One must be careful, however, not to use these modelsin ariy situation in which the derivative of the random process is needed, because the resultingcalculation is almost certain to be wrong.All ofthe correlation functions discussed so far have been positive for all values of r.This isnot necessary, however, andtwo common types of autocorrelation functions that have negativeregions are given byandR A2 sin Jryrx(r) = ---Jryr(6-20)(6-21 )and are illustrated in Figure 6-7. The autocorrelation function of (6-20) arises at the output ofthe narrow band bandpass filter whose input is very wideband noise, while that of (6-21) is
- 240. 6-5 EXAMPLES OF AUTOCORRELATION FUNCTIONSI I(a)/(b)229Figure 6-7 The autocorrelation functions arising at the outputs of (a) a bandpass filter and (b) an ideallow pass filter.typical of the autocorrelation at the output of an ideal low pass filter. Both of these results willbe derived in Chapters 7 and 8.Although there are many othertypes ofautocorrelation functions that arise in connection withsignal and system analysis, the few discussed here are the ones most commonly encountered.The student should refer to the properties of auto correlation functions discussed in Section 6-3and verify that all these correlation functions possess those properties.Exercise 6-5.1a) Determine whether each of the random processes described by theautocorrelation functions of (6-20) and (6-21 ) is differentiable. .b) Indicate Whether the following statement is true or false: The product ofa function that is differentiable at the origin and a function that is nondifferentiable at the origin is always differentiable. Test your conclusionon the autocorrelation function of (6-20).Answers: Yes, yes, trueExercise 6-5.2Which of the following functions of r cannot be valid mathematical modelsfor autocorrelation functions? Explain why.·
- 241. 230b}c}d}e}I• I e-1•110e-(r+2)!2+4.2+ 8CHAPTER 6 · CORRELATION FUNCTIONSAnswers: b, c, e are not valid models.6-6 Crosscorrelation FunctionsIt is also possible to consider the correlation between two random variables from differentrandom processes. This situation arises when there is more than one random signal being appliedto a system or when one wishes to compare random voltages or currents occurring at differentpoints in the system. If the random processes are jointly stationary in the wide sense, and ifsample functions from these processes are designated as X(t)and Y(t), then for two randomvariablesX1 = X(t1)Y2= Y (t1 + •)it is possible to define the crosscorrelationfunction(6-22)The order of subscripts is significant; the second subscript refers to the random variable takenat (t1 + •).4There is also another crosscorrelation function that can be defined for the same two timeinstants. Thus, letY1= Y(ti)X2= X(t1+ •)and define4 This is an arbitrary convention, which is by no means universal with all authors. The definitions shouldbe checked in every case.
- 242. 6-6 CROSSCORRELATION FUNCTIONS 231(6-23)Note thiJ,t because both random processes are assumed to bejointly stationary, these crosscorrelation functions depend only upon the time· difference r.It is important that the processes be jointly stationary and not just individually stationary.It is quite possible to have two individually stationary random processes that are not jointlystationary. In such a case, the crosscorrelation function depends upon time, as well as the timedifference r.The time crosscorrelationfunctions may be defined as before for a particular pair of samplefunctions asand1 1Tellxy(r) = lim - x(t)y(t + r) dtT-+oo 2T -T. 1 1Tellyx(r) = lim - y(t)x(t + r) dtT-+oo 2T -T(6-24)(6-25)If the random processes are jointly ergodic, then (6-24) and (6-25) yield the same value forevery pair of sample functions. Hence, for ergodic processes,ellxy(r) = Rxy (r)ellyx (r) = RYX (r)(6-26)(6-27)In general, the physical interpretation of crosscorrelation functions is no more concretethan that of autocorrelation functions. It is simply a measure of how much these two randomvariables depend upon one another. In the later study of system analysis, however, the specificcrosscorrelation function between system input and output will take on a very definite andimportant physical significance.Exercise 6-6.1Two jointly stationary random processes have sample functions of the formX(t) = 2 cos (St + (})andY(t) = 10 sin (5t + £J)
- 243. 232 CHAPTER 6 · CORRELATION FU NCTIONSwhere fJ is a random variable that is uniformly distributed from O to 2rr. Findthe crosscorrelation funcUon RXY(r) for these two processes.Answer: 20 sin (5r)Exercise 6-6.2Two sample functions from two random processes have the formx(t)= 2 cos Standy(t)= 10 sin StFind the time crosscorrelation function for x(t) and y(t + r ).Answer: 20 sin (5r)6-7 Properties of Crosscorrelation FunctionsThe general properties of all crosscorrelation functions are quite different from those of autocorrelation functions. They may be summarized as follows:1. The quantities Rxr(O) and Rrx(O) have noparticular physical significance and do notrepresent mean-square values. It is true, however, that Rxr(O) = Rrx(O).2. Crosscorrelation functions are not generally even functions of r. There is a type ofsymmetry, however, as indicated by the relationsRrx(r) = Rxr(-r) (6-28)This result follows from the fact that a shift of Y(t)in one direction (in time) is equivalentto a shift of X(t) in the other direction.3. The crosscorrelation function does not necessarily have its maximum value at r = 0. Itcan be shown, however, thatIRxr (r)I :::: [Rx(O)Rr (0)]112 (6-29)with a similar relationship for Ryx(r). The maximum ofthe crosscorrelation function canoccur anywhere, but it cannot exceed th� above value. Furthermore, it may not achievethis value anywhere.
- 244. 6-7 PROPERTI ES OF CROSSCORRELATION FUNCTIONS4. Ifthe two random processes are statistically independent, thenRxy(r) = E[X1, Y2] = E[Xi]E[Yz] = XY= Ryx(r)233(6-30)If, in addition, eitherprocess has zero mean, then the crosscorrelation function vanishes forall r.The converse ofthis is not necessarily true, howeve". The fact thatthe crosscorrelationfunction is zero and that one process has zero mean does not imply that the randomprocesses are statistically independent, except forjointly Gaussian random variables.5. If X(t) is a stationary random process and X(t) is its derivative with respect to time, thecrosscorrelation function.of X(t) and X(t) is given bydRx(r)Rxx(r) = _d_r_ (6-31)in which the right side of (6-31) is the derivative of the autocorrelation function withrespect to r . This is easily shown by employing the fundamental definition ofa derivativeHence,X(t) = limX(t +e)- X(t)e->0 eRxx(r) = E[X(t)X(t +r)]= E {limX(t)X(t +r + e) - X(t)X(t +r)] }e->0 e. Rx(r +e) - Rx(r) dRx(r)= hm = ---e->O e d(r)The interchange ofthe limit operation and the expectation is permissible whenever X(t)exists. Ifthe above process is repeated, it is also possible to show that the autocorrelationfunction of X(t) is(6-32)where the right side is the second derivative of the basic autocorrelation function withrespect to r.It is worth noting that the requirements for the existence of crosscorrelation functionsaremorerelaxed than those forthe existence ofautocorrelation functions. Crosscorrelationfunctions are generally not even functions of r,their Fourier transforms do not have to bepositive for all values ofw, and it is not even necessary thatthe Fourier trarisforms be real.These latter two points are discussed in more detail in the next chapter.
- 245. 234 CHAPTER 6 • CORRELATION FUNCTIONSExercise 6-7.1Prove the inequatity shown in Equation (6-29). This is most easily done byevaluating the expected value of the quantityExercise 6-7.2Xi- f2[ ]2.jRx(0) .± JRy(0)Two random processes have sample functions of the formX(t) = A cos (wot + (J) and Y(t) = B sin (Wot + (J)where e is a random variable that is uniformly distributed between 0 and 27rand A and B are constants.a) Find the crosscorrelation functions RXrf;r) and Ryx{.-r:).b) What is the significance ofthe values ofthese crosscorrelation functionsah = O?Answer: (�) sin wo-r6-8 Examples and Applications of Crosscorrelation FunctionsIt is notedpreviouslythatone ofthe applications ofcrosscorrelation functions is in connectionwith systems withtwoormorerandominputs.Toexplorethis inmoredetail, considerarandomprocess whose sample functions are ofthe formZ(t) = X(t) ± Y(t)in whichX(t) and Y(t) arealsosamplefunctionsofrandomprocesses.Thendefiningtherandomvariables asZ1 = X1 ± f1 = X(t1) ± Y(t1)Z2 = X2 ± Y2 = X(t1 + -r) ± Y(t1 + -r)the autocorrelation function ofZ(t) is
- 246. 6-8 EXAMPLES AND APPLICATIONSRz(r) = E[Z1Z2] = E[(X1 ± Y1)(Xi ± Y2)]= E[X1X2 + Y1 Y2 ± X1 Y2 ± Y1X2]= Rx(r:) + Ry(r:) ± RXY (r:) ± RYX(r:)235(6-33)This result is easily extended to the sum of any number of random variables. In general, theautocorrelation function of such a sum will be the sum of allthe autoccrrelation functions plusthe sum of allthe crosscorrelation functions.Ifthe two r_andomprocesses being considered are statistically independent and one ofthem haszero mean, then both of the crosscorrelation functions in (6-33).vanish and the autocorrelationfunction ofthe sum isjust the sum ofthe autocorrelation functions. An example ofthe importanceof this result arises in connection with the extraction of periodic signals from random noise. LetX(t) be a desired signal sample function of the formX(t) = A cos (wt + (}) (6-34)where (} is a random variable uniformly distributed over (0, 21l). It is shown previously that theautocorrelation function of this process is1Rx(r:) = -A2 cos wr:. 2Next, let Y(t) be a sample function of zero-mean random noise that is statistically independentof the signal and specify that it has an autocorrelation function of the form· ·"The observed quantity is Z(t), which from (6-33) has an autocorrelation function ofRz(r:) = Rx(r:) + Ry (r:)1= -A2 cos wr: + B2e-a1TI2(6-35)This function is sketched in Figure 6-8 for a case in which the average noise power, Y2, is muchlarger than the average signal power, �A2• It is clear from the sketch that for large values of1:, the autocorrelation function depends mostly upon the signaI, since the noise autocorrelationfunction tends to zero as r: tends to infinity. Thus, it should be possible to extract tiny amountsof sinusoidal signal from large amounts of noise by using an appropriate method for measuringthe autocorrelation function of the received signal plus noise.Another method of extracting a small known signal from a combination of signal and noise isto perform a crosscorrelation operation. A typical example of this might be a radar system thatis transmitting a signal X(t). The signal that is returned from any ·target is a very much smallerversion of X(t) and has been delayed in time by the propagation time to the target and back.
- 247. 236 CHAPTER 6 · CORRELATION FUNCTIONSM 2 + a 2ftgure 6-8 Autocorrelation function o f sinusoidal signal plus noise.Since noise is always present at the input to the radar receiver, the total received signal Y(t)may be represented asY(t) = aX(t - •1) + N(t) (6--36)where a is a number very much smaller than 1, -r1 is the round-trip delay time of the signal,and N(t) is the receiver noise. In a typical situation the average power of the returned signal,aX(t - -r1 ), is very much smaller than the average power of the noise, N(t).The crosscorrelation function of the transmitted signal and the total receiver input isRXY(•) = E[X(t)Y(t + -r)]= E[aX(t)X(t_ + -r - •1) + X(t)N(t + -r)]= aRx(• - •1) + RxN(•)(6--37)Since the signal and noise are statistically independent and have zero mean (because they areRF bandpass signals), the crosscorrelation function between X(t) and N(t) is zero for all valuesof -r. Thus, (6-37) becomesRXY(•) = aRx(• - •1) (6--38)Remembering that autocorrelation functions have their maximum values at the origin, it is clearthat if . is adjusted so that the measured value of RXY(•} is a maximum, then . = •1 and thisvalue indicates the distance to the t&rget.In some situations involving two random processes it is possible to observe both the sumand the difference of the two processes,. but not each one individually. In this case, one maybe interested in the crosscorrelation between the sum and difference as a means of learningsomethipg about them. Suppose, for example, that we have available two processes described byU(t) = X(t) + Y(t)V(t) = X(t) - Y(t)(6--39)(6-40)
- 248. 6-8 EXAMPLES AND APPLICATIONS 237in which X(t) and Y(t) are not necessarily zero ·mean nor statistiCally independent. Thecrosscorrelation function between U (t) and V(t) isRuv(r) = E[U(t)V(t + r)]= E[X(t) + Y(t)][X(t + r) - Y(t + r)] (6- • 1 )= E[X(t)X(t + r) + Y(t)X(t + r) - X(t)Y(t + r) - Y(t)Y(t + r)]Each of the expected values in (6-41) may be identified as an autocorrelation function or acrosscorrelation function. Thus,Ruv(r) = Rx(r) + Rrx(r) - Rxy(r) - Rr(r) (6--42)In a similar way, the reader may verify easily that the other crosscorrelation function isRvu(r) = Rx(r) - Rrx(r) + Rxr(r) - Ry(r) (6--43)If both X and Y are zero mean and statistically independent, both crosscorrelation functionsreduce to the same function, namelyRuv(r) = Rvu(r) = Rx(r) - Ry(r) (6--44)The actual measurement of crosscorrelation functions can be carried out in much the sameway as that suggested for measuring autocorrelation functions in Section 6-4. This type ofmeasurement is still unbiased when crosscorrelation functions are being considered, but theresult given in (6-17) for the variance of the estimate is no longer strictly true-particularly ifone ofthe signals contains additive uncorrelated noise, as in the radar example just discussed.Generally speaking, the number of samples required to obtain a given variance in tlie estimateofa crosscorrelation function is much greater than that required for an autocorrelation function.To illustrate crosscorrelation computations using the computer consider the following example. A signal x(t) = 2 sin(lOOOrrt + 8) is measured in the presence of Gaussian noise having abandwidth of 50 Hz and a standard deviation of5. This corresponds to a signal-to-noise (power)ratio of0.5 x 22/52 = 0.08 or -11dB. This signal is sampled atarateof 1000 samplespersecondfor 0.5 second giving 501 samples. These samples are processed in two ways: by computingthe autocorrelation function ofthe signal and by computing the crosscorrelation function ofthesignal and another deterministic signal, sin(lOOOm). For purposes of this example it will beassumedthattherandom variable (} takes onthevalue ofJr/4. ThefollowingMATLAB programgenerates the signals, carries qut the processing, and plots the results.% corxmp4.m ceosscorrelation exampleT = 0.5; fs = 1 000; dt = 1/fs; fo = 50; N = T/dt;t1 =0:.001 :.5;x =2*sin(2*fo*pi*tf + .25*pi*ones(size(t1 )));rand(seed, 1 000);
- 249. 238 CHAPTER 6 · CORRELATION FUNCTIONSy1 =randn(1 ,N+1 );[b,a]=butter(2,50/500); %2nd order 50Hz LP filtery=filter(b,a,y1 ); %filter noisey=5*y/std(y);Z= X + y;[t2,u] = corb(z,z,fs);x1 = sin(2*fo*pi*t1 );[t3,v] = corb(x,x1 ,fs);subplot(3,1 ,1 ); plot(t1 ,z);xlabel(TIME);ylabel(z(t));subplot(3,1 ,2); plot(t2,u);xlabel(LAG);ylabel(Rzz);subplot(3,1 ,3); plot(t2,v);xlabel(LAG);ylabel(Rxz);The results are shown in Figure 6--9. The autocorrelation function of the signal indicates thepossibility of a sinusoidal signal being present but not distinctly. However the crosscorrelation20z{t) 0-20050Rzz0-50-0.51Rxz 0-1-0.50. 1 0.200LAG0.3 0.4ftgure 6-9 Signal, autocorrelation function, and crosscorrelation function CORXMP4.0.50.50.5
- 250. 6-9 CORRELATION MATRICES FOR SAMPLED FUNCTIONS 139function clearly shows the presence of the signal. It would be possible the determine the phaseOfth� sinusoid by measuring the time lag of the peak of the crosscorrelation function from theorigin and multiplying by 21I/ T, where T is the period of the sinusoid.Exercise 6-8.1A random process has sample functions of the form X(t) = A in which Ais a random variable that has a mean value of 5 and a variance of 1 0.Sample functions from this process can be observed only in the presenceof independent noise having an autocorrelation function ofRN(•) = 10 exp (-2 1rl)a) Find the autocorrelation function of the sum of these two processes.b) If the autocorrelation function of the sum is observed, find the value of rat which this autocorrelation function is within 1 % of its value at r = oo .Answers: 1 .68, 35 + 1 oe-21i-IExercise 6-8.2A random binary process such as that described in Section 6-2 has samplefunctions with amplitudes of ± 1 2 and ta = 0.01 .It is applied to the half-wave- rectifier circuit shown below.IdealR 1 diodeX(t)r ··i+Y(t)a) Find the autocorrelation function oft he output, Ry{_r).b) Find the crosscorrelation function RxY<.•).c) Find the crosscorrelation function Ryx(.r).Answers: 9 + 9(1 - o�ci1 ).36[1 - lr l]
- 251. Z40 CHAPTER 6 · CORRELATION FUNCTIONS6-9 Correlation Matrices for Sampled FunctionsThe discussion of correlation thus far has concentrated on only two random variables. Thus,for stationary processes the correlation functions can be expressed as a function of the singlevariable r . There are many practical situations, however, in which there may be many random variables and it is necessary to develop some convenient method for representing themany autocorrelations and crosscorrelations that arise. The use of vector notation providesa convenient way of representing a set of random variables, and the product of vectors thatis necessary to obtain correlations results in a matrix. It is important, therefore, to discusssome situations in which the vector representation is useful and to describe some of theproperties of the resulting correlation matrices. A situation in which vector notation is useful in representing a signal arises in the case of a single time function that is sampled atperiodic time instants.. If only a finite number of such samples are to be considered, sayN, t4en each sample value can become a component of an (N x 1) vector. Thus, if thesampling times are t1 , t2, . . . , tN, the vector representing the time function X (t) may be expressed as[X (t1) ]X = X �t2)- X (tN)If X (t) is a sample function from a random process, then each of the components of the vector·X is a random variable.It is now possible to define a correlation matrix that is (N x N) and gives the correlationbetween every pair of random variables. Thus,[X (t1 )X (ti)TX (t2)X(t1)Rx = E[XX ] = E :X (tN)X(t1)X(ti)X(t2)X (t2)X(t2)X (t1)X(tN) ]X (tN)X(tN)where XT is the transpose ofX. When the expected valueofeach elementofthe matrix is taken,that element becomes a particular value of the autocorrelation function of the random processfrom which X (t) came. Thus.[Rx (t1 , t1)Rx =Rx (�2, t1 )Rx (tN, t1 )Rx(t1 , t2)Rx (t2, t2)(6-45)
- 252. 6-9 CORRELATION MATRICES FOR SAMPLED FUNCTIONS 241When the random process from which X(t) came is wide-sense stationary, then all thecomponents of Rx become functions of time difference only. If the interval between samplevalues is 11t, thenandt1 = t1 + 11tt3 = t1 + 211ttN = ti + (N - 1)11t[Rx[O] Rx[l1t]Rx :::: Rx:[l1t] Rx[O]Rx[(N - 1)11t]· · · Rx[(N - l)/1t] ]Rx[O](6-46)where use has beenmade ofthe symmetry ofthe autocorrelation function; that is, Rx[il1t] =Rx[-i.6:t]. Note that as a consequence of the symmetry, Rx is a symmetric matrix (even inthe nonstationary case), and that as a consequence of stationarity, the major diagonal (and alldiagonalsparallel to it) have identical elements.Although the Rx just defined is a logical consequence ofprevious definitions, it is not themost customary way of designating the correlation matrix of a random vector consisting ofsample values. A more common procedure is to define a covariance matrix, which contains thevariances and covariance� oftherandomvariables. Thegeneralcovariance:betweentworandomvariables is defined asE{[X(t;) - X(t;)][X(tj) - X(tj)]} = UjUjPijwhere X(t;) = mean value ofX(t;)X(ti) = mean value ofX(ti)a? = variance ofX(t;)af = variance of X(tj)Pii = normalized covariance coefficient ofX(t;) and X(ti)= 1 , when i = jThe covariance matrix is defined as-T -TAx = E[X - X)(X - X )]where X is the mean value of X. Using the covariance definitions leads immediately to(6-47)(6-48)
- 253. 242 CHAPTER 6 · CORRELATION FU NCTIONS[u2U�U1 P21Ax= .UNUI PNI(6-49)since p;; =1,for i=1,2,. . . , N. By expanding (6-49)it is easy to show that Ax is related toRxby- -TAx=Rx-XXIf the random process has a zero mean, then Ax=Rx.(6-50)The above representation for the covariance matrix is valid for both stationary and nonstationary processes. In the case of a wide-sense stationary process, however, all the variances arethe same and the correlation coefficients in a given diagonal are the same. Thus,Pij =Pli-jlandPlP1 1P2 PlAx= u2PN-1Such a matrix is said to be Toeplitz.i,j=l,2,. . ., Ni,j=1, 2,. . . , NP2P11 P1PN-1PN-21 P1Pl 1(6-51)As anillustration ofsomeofthe above concepts, supposewehave a stationary randomprocesswhose autocorrelation function is given byRx(r) =lOe-11+9 (6-52)To keep the example simple, assume that three random variables separated by 1 second are tobe considered. Thus, N =3 and flt = 1.Evaluating (6--52)for r =0, 1,2 yields the valuesthat are needed for the correlation matrix. Thus, the correlation matrix becomes[ 19 12.68Rx= 12.68 1910.35 12.6810.35 ]12.6819Since the variance of this process is I0 and its mean value is ±3, the covariance matrix is
- 254. 6-9 CORRELATION MATRICES FOR SAMPLED FUNCTIONS 243[ 1 0.368 0. 1 35JAx = 10 0.368 1 0.3680. 135 0.368 1Another situation in which the use of vector notation is convenient arises when the raridomvariables come from different random processes. In this case, the vector representing all therandom variables might be written asThe correlation matrix is now defined asin whichRx (r) = E[X(t)XT(t + r)][Ri(t)= Rz�(l)RN1(l)R;(l) = E[X;(t)X;(t + r)]Rij(l) = E[X;(t)Xj(t + r)](6-53)Notethatinthiscase,theelementsofthecorrelationmatrixarefunctionsofrratherthannumbersastheywereinthe caseofthecorrelation matrixassociatedwith samples takenfroma single randomprocess. Situations in which such acorrelationmatrix might occur arise in connection withant�nna arrays or arrays ofseismic detectors. In such system8, the noise signals at each antennaelement, or each seismic detector, may be from different, but correlated, random processes.Before we leave the subject of covariance matrices, it is worth noting the important rolethat these inatrices play in connection with thejoint probability density function forNrandomvariables from a Gaussian process. It was noted earlier that the Gaussian process was one ofthe few for which it is possible to write a joint probability density function for any numberof random variables. The derivation of this joint density function is beyond the scope of thisdiscussion, but it can be shownthatitbecomesf(x) = f[x(t1), x(tz), . . . , x(tNJ]- ex - - x - x A x - x1 [ 1 T -T - I]- (27r)N/2 iAxll/2 p 2 ( ) x(�where IAIx is the determinant of Ax and A:X1 is its inverse.(6-54)
- 255. 244 CHAPTER 6 • CORRELATION FUNCTIONSThe concept of correlation matrices can also be extended to represent crosscorrelationfunctions. Suppose we have two random vectors X(t) and Y(t) where each vector containsNrandom variables. Thus, let[X1(t) ]X2(t)X(t) = .XN(t)[Y1(t) ]Y2(t)Y(t) = .YN(t)By analogy to (6-53) the crosscorrelation matrix can be defined aswhere nowRxy (r) = E[X(t)YT(t + r)][RI!(r)R21(r)RN1(r)R!2(r)R22(r)R;;(r) = E[X;(t)Y;(t +r)]Rij(r) = E[X;(t)Yj(t + r)](6-55)In many situations the vector of random processes Y(t) is the sum of the vector X(t) anda statistically independent noise vector N(t) that has zero mean. In this case, (6-55) reduces.to the autocorrelation matrix of (6-53) because the crosscorrelation between X(t) and N(t) isidentically zero. There are other situations in which the elements of the vector Y(t) are timedelayed versions of a single random process X(t). Also unlike the autocorrelation matrix of(6-53), it is not necessary that X(t) and Y(t) have the same number of dimensions. If X(t) isa column vector of size M and Y(t) is a column vector of size N, the crosscorrelation matrixwill be an M x Nmatrix instead of a square matrix. This type of matrix may arise if X (t) isthe single wideband random input to a system and the vector Y(t) is composed of responses atvarious points in the system. As discussed further in a subsequent chapter, the crosscorrelationmatrix, which is now a 1 x N row vector, can be interpreted as the set of impulse responses atthese various points.Exercise 6-9.1A random process has an autocorrelation function ofthe formRx(r) = lOe-11 cos 2Jr
- 256. PROBLEMSWrite the correlation matrix associated with four random variables definedfor time instants separated by 0.5 second.Answers: Elements in the first row include 3.677, 2.228, 1 0.0, 6.064Exercise 6-9.2A covariance matrix for a stationary random process has the form[1 0.6- 10.4 0.60.2 -Fill in the blank spaces in this matrix.Answers: 1 , 0.6, 0.2, 0.40.4 - ]0.6 -0.6-12456-1 . 1 A stationary random process having sample functions of X(t) has an autocorrelationfunction ofRx(r) = 5e-5li-IAnother random process has sample functions ofY(t) = X(t) + bX(t - 0. 1)a) Find the value of bthat minimizes the mean-square value of Y(�).b) Find the value of the minimum mean-square value of Y(t).c) If lbl :::: 1, find the maximum mean-square value of Y(t).6-1 .2 For each of the autocorrelation fm;1ctions given below, state whether the process ifrepresents might be wide-sense stationary or cannot be widesense stationary.
- 257. 246 CHAPTER 6 · CORRELATION FUNCTIONSsin t1 cos t1 - cos t1 sin t1d) Rx(t1, t2) = --------t1 - t16-2.1 Consider a stationary random process having sample functions of the form shownbelow:X(t)t0 + T t0 + 2TAt periodic time instants to ± nT, a rectangular pulse of unit height and width T1may appear, or not appear, with equal probability and independently frominterval tointerval. The time to is a random variable that is uniformly distributed over the periodTand T1 :::: T/2.a) Find the mean value and the mean-square value of this process.b) Find the autocorrelation function of this process.6-2.2. Find the time autocorrelation function ofthe sample function in Problem 6-2.1.6-2.3 Consider a stationary random process having sample functions of the form00X(t) = L Ang(t - to -nT)n=-ooin which the An are independent random variables. that are +1 or -1 with equalprobability and to is a random variable that is uniformly distributed over the periodT. Define a functionG(r) = L:g(t)g(t + r) dtand express the autocorrelatlon function of the process in terms of G(r).
- 258. PROBLEMS 2476-3.t Which of the functions shown below cannot be valid autocorrelation functions? Foreach case explain why it is not an autocorrelation function.- 2- 1 0(a)g(T)10(cl- 22 - 26-3.2 A random process has sample functions of the formX(t) = Y cos (w0t + 8)0(b)0(d)2in which Y, w0,and e are statistically independent random variables. Assume the Yhasa mean value of 3 and a variance of 9, that e is uniformly distributed from -rr to rr,and that wa is uniformly distributed from -6 to +6.a) Is this process stationary? Is it ergodic?b) Find the mean and mean-square value of the process.c) Find the autocorrelation function ofthe process.6-3.3 A stationary random process has an autocorrelation function ofthe form2Rx(•) = lOOe-T cos 2rrr + 10 cos 6rrr + 36a) Find the mean value, mean-square value, and the variance ofthis process.b) What discrete frequency components are present?c) Find the smallest value of r for which the random variables X(t) and X(t + r) areuncorrelated.
- 259. 248 ·cHAPTER 6 • CORRELATION FUNCTIONS6-3A Consider a function of r ofthe form·[ l•IJ·V(r) = 1 - Tlrl:S: T= 0 lrl> TTake the Fourier transform of this function and show that it is a valid autocorrelationfunction onlyfor T= 2.6-4.1 A stationary random process is sampled at time instants separated by 0.01 seconds.The sample values arek Xk k Xk k Xk0 0.19 7 -1.24 14 1.451 0.29 8 -1.88 15 -0.822 1.44 9 -0.31 16 -0.253 0.83 10 1.18 17 0.234 -0.01 11 1.70 18 -0.915 -1.23 12 0.57 19 -0.196 -1.47 13 0.95 20 0.24a) Find the sample mean.b) Find the estimated autocorrelation function R(0.01 n) for n = 0, 1, 2, 3 usingequation (6-15).c) Repeat (b) using equation (6-16).6-4.2 a) For the data ofProblem 6-4.1,findan upper bound on the variance ofthe estimatedautocorrelation function using the estimated values ofpart (b).b) Repeat (a) using the estimated values ofpart (c).6-4.3 An ergodic random process has an autocorrelation function of the form Rx(r) =10sinc2(r).a) Over what range of r-values must the autocorrelation function of this process beestimated in order to include the first two zeros ofthe autocorrelation function?b) If 21 estimates (M:::::20) of the autocorrelation are to be made inthe interval specifiedin (a), what should the sampling interval be?
- 260. PROBLEMS 249c) How many sample values of the random process are required so that thenns error ofthe estimate is less than 5 percent ofthe true maximum value ofthe autocorrelationfunction?.6-f..4 Assume that the true autocorrelation function of the random process from which thedata of Problem 6-4. 1 comes has the formand is zero elsewhere.a) Find the values ofA and T that provide the best fit to the estimated autocorrelationfunction values of Problem 6-4. l(b) in the leastmean-square sense. (See Sec. 4-6.)b) Using the results of part (a) and equation (6-18), find another upper bound on thevariance of the estimate of the autocorrelation function. Compare with the result ofProblem 6-4.2(a).6-1.5 A random process has an autocorrelation function of the formRx(o) = 10e-5l•I cos 20t"If this process is sampled every 0.01 second, find the number of samples required toestimate the autocorrelation functionwith a standard deviation.that is no more than 1 %of the variance of the process.6-1.6 The follow.ing MATLAB program generates 1000 samples of a bandlimited noiseprocess. Make a plot of the sample function and the time autocorrelation function ofthe process. Make an expanded plot of the autocorrelation function for lag values of±0. 1 second around the origin. The sampling rate is 1000 Hz.x = randn(1 ,2000);[b,a] = butter(4,20/500);y = filter(b,a,x);y = y/std(y);6-4.7 Use the computer to make plots of the time autocorrelation functions of the followingdeterministic signals. .a) rect (400t)b) sin (2000irt) rect (400t).
- 261. 250 CHAPTER 6 • CORRELATION FUNCTIONSc) cos (2000.7lt)rect (400t)6-5� 1 Consider a random process having sample functions of the form shown in Figure 6-4(a) and assume that the time intervals between switching times are independent, exponentially distributed random variables. (See Sec. 2-7. ) Show that the autocorrelationfunction of this process is a two-sided exponential as shown in Figure 6-4(b).6-5.2 Supposethateachsamplefunctionoftherandomprocess in Problem 6-5.lis switchingbetween 0and 2A instead of between ±A. Find the autocorrelation function of theprocess now6-5.3 Determine the mean value and the variance of each of the random processes havingthe following autocorrelation functions:a) loe-r2b) 10e-r2cos 2.7lr2c)r2 + 810--r2 + 46-5.4 Consider a random process having an autocorrelation function ofa) Find the mean and variance of this process.b) Is this process differentiable? Why?6-7. 1 Two independent stationary random processes having sample functions of X(t)andY (t)have autocorrelation functions ofandRx (r) = 25e-JOlrl cos 100.7lrsin 50.7lrRy(r) = 16---50.7lra) Find the autocorrelation function of X(t)+ Y(t).b) Find the autocorrelation function of X(t)-Y(t).c) Find both crosscorrelation functions ofthe two processes defined by (a) and (b).
- 262. PROBLEMS 251d) Find the autocorrelation function of X (t) Y(t).6-7.2 For the two processes ofProblem 6-7. l(c) find the maximum value that the crosscorrelation functions can have using the bound of equation (6-29). Compare this boundwith the actual maximum values that these crosscorrelation functions have.6-7.3 A stationary random process has an autocorrelation function ofa) Find Rxx (r).b) Find R-x (r).sin rRx (r) = -r6-7.4 Two stationary random processes have a crosscorrelation function ofRxy (r) = 1 6e-<r- I J2Find the crosscorrelation function of the derivative of X (t) and Y(t). That is, findR-xr (r).6-8. 1 A sinusoidal signal has the formX (t) = 0.0 1 sin ( I OOt + 8)in which e is a random variable that is uniformly distributed between -rr and rr .This signal is observed in the presence of independent noise whose autocorrelationfunction isa) Find the value of the autocorrelation function of the sum of signal and noise atr = 0.b) Find the smallest value of r for which the peak value ofthe autocorrelation functionof the signal iis 10 times larger than the autocorrelation function of the noise.6-8.2 One way of detecting a sinusoidal signal in noise is to use a correlator. In this device,the incoming signal plus noise is multiplied by a locally generated reference signalhaving the same form as the signal to be detected and the average value of the productis extracted with a low-pass filter. Suppose the signal and noise of Problem 6-8.1 aremultiplied by a reference signal of the form
- 263. 252 CHAPTER 6 • CORRELATION FUNCTIONSr(t) = 10 cos (lOOt + </J)The product isZ(t) = r(t)X(t) + r(t)N(t)a) Find the expected value of Z(t) where the expectation is taken with respect to thenoise and <P is assumed to be a fixed, but unknown, value.b) For what value 0f <P is the expected value of Z(t) the greatest?6-8.3 Detection of a pulse of sinusoidal oscillation in the presence of noise can be accomplished by crosscorrelating the signal plus noise with a pulse signal at the samefrequency as the sinusoid. The following MATLAB program generates 1000 samplesof a random process with such a pulse. The sampling frequency is 1000 Hz. Computethe crosscorrelation function of this signal and a sinusoid pulse, sin (160Jrt), that is100 ms long. (Hint: Convolve a 100-ms reversed sinusoidal pulse with the·signal usingthe MATLAB commands ftiplr and conv.)%P6_8_3t1 =0.0:0.001 :0.099;s1 = sin(1 OO*pi*t1 );s = zeros(1 ,1 000);s(700:799) = s1 ;randn(seed, 1 000)n1 = randn(1 , 1 000);x = s + n1 ;6-8.4 Use the computer to make plots ofthe time crosscorrelation functions ofthe followingpairs of signals.a) x(t) = rect (400t) y(t) = sin (2000Jrt) rect (400t)b) x(t) = sin (2000Jrt) rect (400t) y(t) = cos (2000Jrt) rect (400t)6-8.5 Assume X(t) is a zero mean, stationary Gaussian random process. Let X1 = X(t1) andX2 = X(t2) be samples of the process at t1 and t2 having a correlation coefficient ofE{X1X2} Rx(t2 - t1)p=a1a2=Rx(O)Further let Y1 = g1(X1) and Y2 = g2(X2) be random variables obtained from X1 andX2by deterministic (notnecessarily linear) functions g1(-) andg2(· ). Then an important
- 264. PROBLEMS 253result from probability theory called Prices Theorem relates the correlation functionof Y1 and Y2 to p in the following manner.dnRy= Rn (O)E{dng1 (X1) . dng2(X2) }dpn xdXj dXiThis theorem can be used to evaluate readily the correlation function of Gaussianrandom processes aftercertainnonlinearoperations. Consider the case ofhard limitingsuch thatg1 (X) = g1(X) = + 1- 1a) Using n = 1 in Prices Theorem show thatx > 0X < ORr(t1 , t2) = � sin-1 (p) or p = sin [�Rr(t1, t2)]b) Show how Rr(t1 , t2) can be computed without carrying out multiplication by usingan "exclusive or" circuit. This procedure is called polarity coincidence correlation.6-8.6 Itisdesiredtoestimate thetimedelaybetween theoccurrenceofazeromean, stationaryGaussian random process and an echo of that process. The problem is complicated bythe presence of an additive noise that is also a zero mean, stationary Gaussian randomprocess. Let X(t) be the original process and Y(t) = aX(t -r)+N(t) be the echo withrelative amplitude a,_ time delay r, and noise N(t). The following MATLAB M-filegenerates samples ofthe signals X(t) and Y(t). It can be assumed that the signals arewhite, bandlimited signals sampled ata 1 MHz rate.%P6_8_6.mclear w; clear yrandn(seed, 2000)g=round(200*sqrt(pi));z=randn(1 ,10poo + g);y=sqrt(0.1 )*z(g:1 0000+g-1 ) + randn(1 ,1 0000); % -10dB SNRX=Z(1 :10000);a) Write a program to find r using the peak of the correlation function found usingpolarity coincidence correlation. (Hint: use the sign function and the = = operatorto make a polarity coincidence correlator.)b) Estimate the value of a given that the variance of X(t) is unity.
- 265. 254 CHAPTER 6 · CORRELATION FUNCTIONS6-8.7 Vibration sensors are mounted on the front and rear axles ofa moving vehicle to pickup the random vibrations due to the roughness ofthe road surface. The signal from thefront sensor may be modeled asf(t) = s(t) + n1(t)where the signal s(t) and the noise n1(t) are from independent random processes. Thesignal from the rear sensor is modeled asr(t) = s(t - •1 ) + nz(t)where n2(t) is noise that is independent of both s(t) and n1(t). All processes havezero mean. The delay r1 depends upon the spacing of.the sensors and the speed ofthevehicle.a) Ifthe sensors areplaced 5 m apart, derive a relationship between r1 andthe vehiclespeed v.b) Sketch a block diagram of a system that can be used to measure vehicle speed overa range of 5 m per second to 50 m per second. Specify the maximum and minimumdelay values that arerequired if an analog correlator is used.c) Why is there a minimum speed that can be measured this way?d) Ifa digital correlatoris used, and the signals areeach sampled at arateof 1 2 samplesper second, what is the maximum vehicle speed that can be measured?6-8.8 The angle to distant stars can be measuredby crosscorrelating the outputs oftwo widelyseparated ant�nnas and measuring the delay required to maximize the crosscorrelationfunction. The geometry to be considered is shown below. In this system, the distancebetween antennas is nominally 500 m, but has a standard deviation of 0.01 m. It isdesired to measure the angle e with a standard deviation of no more than I milliradianfor any e between 0 and 1.4 radians. Find an upper bound on the standard deviation ofthe delay measurement in order to accomplish this. (Hint: Use the total differential tolinearize the relation.)I /I /1-- ,,, "I fl-..,. /I � /I ,,//s(t - T1 ) + n2(t) s(t) + n, (t)
- 266. PROBLEMS �556-9. 1 A stationary random process having an autocorrelation function ofRx(r) = 36e-Zlrl cos J His sampled at periodic time instants separated by 0.5 second. Write the COlriancematrix for four consecutive samples taken from this process.6-9.2 A Gaussian random vectorhas a covariance matrix ofA = [001.s �.5 �.5]0.5 1 .Find the expected value, E[XTA-1 X].6-9.3 A transversal filteris atapped delay line with the outputs fromthe various taps weighted· and summed as shown below.·Y( t)Ifthedelay betweentapsis /),.ttheoutputs fromthetaps can be expressed as a vectorby[ X(t) ]X(t) = X(t- flt). X(t-:N!J,.t)Likewise, the weighting factors on the various taps can be written as a vector
- 267. 256 CHAPTER 6 • CORRELATION FUNCTIONS•=[I]a) Write an expression for the output of the transversal filter, f(t), in tenns of thevectors X(t) and a .b) If X(t) is from a stationary random process with an autocorrelation function ofRx(•), write an expression for the autocorrelation function Ry(t).6-9A Let the input to the transversal filter of Problem 6-9.3 have an autocorrelation functionofand zero elsewhere.I•IRx(•) = 1 - -!::..ta) If the transversal filter has 4 taps (i.e., N = 3) and the weighting factor for each tapis a; = 1 for all i, detenirine and sketch the autocorrelation function of the output.b) Repeat part (a) if the weighting factors are a; = 4 - i, i = 0, 1, 2, 3.ReferencesSee the referencesfor Chapter 1. Ofparticular interestfor the material ofthis chapter are the books byDavenport and Root, Helstrom, and Papoulis.
- 268. CHAPTER 7----------Spectral Density7-1 IntroductionThe use of Fourier transforms and Laplace transforms in the analysis of iinear systems iswidespreadandfrequently l�adsto much saving in labor. Theprincipal reason forthis simplification is thatthe convolution integral oftime-domain methods is replaced by simple multiplicationwhen frequency-domain methods are used.In view of this widespread use of frequency-domain methods, .it is natural to ask if suchmethods are still useful when the inputs to the system are random. The answer to this questionis that they arestill useful but that some modifications arerequired and that a little more care isnec.essary in order to avoid pitfalls. However, when properly used, frequency-domain methodsofferessentially the sanie advantages in dealing with random signals as they do with nonrandomsignals.Before beginning this discussion, it is desirable to review briefly the frequency-domainrepresentation of a nonrandom time function. The most natural representation of this sort isthe Fourier transform, which leads to the concept of frequency spectrum. Thus, the Fouriertransform of some nonrandom time function, y(t),is defined to beY(w) = 1_:y(t)e-jwtdt {.7-1)If y(t) is a voltage, say, then Y(w) has the units of volts per rads/second and represents therelativemagnitude and phase of steady-state sinusoids (of frequency w) that can be summedto produce the original y(t). Thus, the magnitude of the Fourier transform has the physicalsignificance of being the amplitudedensityas a function of frequency and, as such, gives aclear indication of how the energy of f(t)is distributed with respect to frequency. It is oftenconvenient to measure the frequency in hertz rather that radians per second, in which case theFourier transform is written as257
- 269. 258 CHAPTER 7 · SPECTRAL DENSITYFy (j) 1_:y(t)e-j2rrft dtIf y(t) is a voltage the units of Fy (f) would be V/Hz. It is generally quite straightforward toconvertbetween the variables f and wby making the substitutions f = w/27r in F(f)to obtainF(w) and w = 27rf in F(w) to obtain F(f) . Both representations will be used in the followingsections.It might seem reasonable to use exactly the same procedure in dealing with randomsignalsthat is, to use the Fourier transform of any particular sample function x (t), defined byFx(w) = 1_:x (t)e-j"" dtas the frequency-domain representation of the random process. This is not possible, however,for at least two reasons. In the first place, the Fourier transform will be a random variableover the ensemble (for any fixed w), since it will have a different value for each member ofthe ensemble of possible sample functions. Hence, it cannot be a frequency representation ofthe process, but only of one member of the process. However, it might still be possible touse this function by finding its expected value (or mean) over the emsemble if it were not forthe second reason. The second, and more basic, reason for not using the Fx(w)just defined isthat-for stationary processes, at least-it almost never exists! It may be recalled that one ofthe conditions for a time function to be Fourier transformable is that it be absolutely integrable;that is,·1_:lx(t) I dt < oo (7-2)This condition can neverbe satisfiedby any nonzero sample functionfrom a widesense stationaryrandomprocess. TheFouriertransform in theordinary sense will neverexist inthis case, althoughit may occasionally exist in the sense ofgeneralized functions, including impulses, and so forth.Now that the usual Fourier transform has been ruled out as a means ofobtaining a frequencydomain representation for a random process, the next thought is to use the Laplace transform,since this contains a built-in convergencefactor. Ofcourse, the usual one-sided transform, whichconsiders f(t) for t ::::: 0 only, is not applicable for a wide-sense stationary process; however,this is no real difficulty since the two-sided Laplace transform is good for negative as well aspositive values oftime. Once this is done, the Laplace transform for almost any sample functionfrom a stationary random process will exist.It turns out, however, that this approach is not so promising as it looks, since it merelytransfers the existence problems from the transform to the inverse transform. A study of theseproblems requires a knowledge ofcomplex variable theory that is beyond the scope ofthepresentdiscussion. Hence, it appears that the simplest mathematically acceptable approach is to returnto the Fourier transform and employ an artifice that will ensure existence. Even in this case itwill not be possible tojustify rigorously all the steps, and a certain amount ofthe procedure willhave to be accepted on faith.
- 270. 7-2 RELATION OF SPECTRA L DENSITY TO TH E FOU RIER TRANSFORM 2597-2 Relation of Spectral Density to the Fourier TransformTo use the Fourier transform technique it is necessary to modify the sample functions of astationary random process in such a way that the transform of each sample function exists.There are many ways in which this might be done, but the simplest one is to define a new samplefunction having finite duration. Thus, letXr (t) = X (t) lt l S T < oo= 0 lt l > T(7-3)and note that the truncated time function Xr (t) will satisfy the condition of (7-2), as long aT remains finite, provided that the stationary process from which it is taken has a finite meansquare value. Hence, Xr (t) will be Fourier transformable. In fact, Xr (t) will satisfy the morestringent requirement for integrable square functions; that is1_:1Xr (t) l2 dt < 00 (7-4)This condition will be needed in the subsequent development.Since Xr (t) is Fourier transformable, its transform may be written asFx(w) = 1_:Xr (t)e-jwt dt T < oo (7-5)Eventually, it will be necessary to let T increase without limit; the purpose of the followingdiscussion is to show that the expected value of 1 Fx (w) l2 does exist in the limit even thoughthe Fx(w) for any one san;iple function does not. The first step in demonstrating this is to applyParsevals theorem to Xr (t) and Fx(w).1 Thus, since xr (t) = 0 for It! > T,1T 1 looX�(t) dt = - 1 Fx (w) l2 dw-T2rr -oo (7-6)Note that 1Fx(w)l2 = Fx(w)Fx(-w) since Fx (-w) is the complex conjugate of Fx (w) whenXr (t) is a real time function.Since the quantity being sought is the distribution ofaverage power as afunction offrequency,the next step is to average both sides of(7-6)over the total time, 2T. Hence, dividing both sidesby 2T givesI Parsevals theorem states that if f (t) and g(t) are transformable time functions with transfo�s of F(w)and G(w), respectively, then .100 1 100f(t)g(t) dt = - F(w) G(-w) dw-oo 2rr -oo
- 271. 260 CHAPTER 7 • S PECTRAL DENSITYl 1T l loo- X}(t)dt= -- !Fx(w)l2dw2T -T 4:irT -oo(7-7)The left side of (7-7) is seen to be proportional to the average power of the sample function inthe time interval from -Tto T.More exactly, it is the square of the effective value of Xr(t).Furthermore, for an ergodic process, this quantity would approach the mean-square value of theprocess as Tapproached infinity.However, it is not possible at this stage to let Tapproach infinity, since Fx(w)simply doesnot exist in the limit. It should be remembered, though, that Fx(w)is a random variable withrespect to the ensemble of sample functions from which X(t) was taken. It is reasonable tosuppose (and can be rigorously proved) that the limit of the expectedvalue of (l/T)IFx(w)i2does exist, since the integral ofthis "aiw:iys positive" quantity certainly does exist, as shown by(7-4). Hence, taking the expectation of both sides of (7-7), interchanging the expectation andintegration, and then taking the limit as T� oo we obtainE{-1 1TX}(t)dt}= E{-1-100!Fx(w)i2dw}2T -T 4:irT -oo1 1T_ 1 loolim - X2dt= lim -- E{1Fx(w)l2}dwT->oo 2T -T T-+oo 4:irT _00(x2)= _l JoolimE{!Fx(w)i2}dw2:ir _00T-+oo 2T(7-8)(7-8)For a stationary process, the time average ofthe mean-square value is equal to the mean-squarevalue and (7-8) can be written asx2 = _l loolimE{IFx(w)i2}dw2:ir _00T-+oo 2T (7-9)The integrand ofthe right side of(7-9),which will be designated by the symbol Sx(w),is calledthe spectraldensityof the random process. Thus,S ( ) 1.E[IFx(w)l2]x (t) = ImT-+oo 2T (7-10)and it must be remembered that.it is not possible to let T� oo before taking the expectation.The expression for the spectral density in terms of the variablefisSx(f)= limE{IFx(f)l2}T-+oo 2T (7-1 1)An important point, and one that sometimes leads to confusion, is that the units of Sx(w)andSx (f)are the same. If X(t)is a voltage then the units ofthe spectral density are V2/Hz and themean square value of X(t) as given by Equation (7-9) is
- 272. 7 - 2 RELATION OF SPECTRAL D ENSITY TO THE FOU RIER TRANSFORMX2= - Sx(w)dw1 100.2Jr -oo.X2= £:Sx(f)df261(7-12)(7-13)where Sx(f)is obtained from Sx(w)by substituting w = 21lf.For example, a frequentlyoccurring spectral density has the form2aSx(w)= 2 2w +aThe corresponding spectral density in terms offwould be2aSx(f)=(2Jrf)�+a2The mean-square value can be c9mputed from either as follows.- 1 100 2a 1 [2a 1(w)]oo1 (Jr 1l)X2= - dw= - - tan- - = - -+- = 12Jr _00 w2+ a2 2Jr a a _00 Jr 2 2Although the equations for the spectral density in the variablesfand wappear to be different,they give th� same magnitude ofthe spectral density at correspondin� frequencies. For example,consider the spectral density at the origin and at 1 rad/second. horn Sx(w)it follows that2a 2 2Sx(w= 0) =0 +a2 = � V1/Hz2a 2Sx(w= 1) = 1 +a2 V /Hzand for Sx(f)the corresponding frequencies are f= 0 and f= 2Jr.2a 2 2Sx(f= 0) =0 + a2 = � V /Hz2a 2aSx(f= 2Jr)= [2Jr(l/2Jr)]2+a2 =1 +a2 y2/HzThe choice of expressing spectral density as a function of worfdepends on the mathematicalformofthe expression and the preferences ofthe person carrying out the analysis. For example,
- 273. 262 CHAPTER 7 · SPECTRAL DENSITYwhen there are impuises in the spectrumthe useof/ is often simpler, andwhen integration usingcomplex variable theory is employed it is easier to use w. However, in all cases the final resultswill be the same.·The physical interprl?tation of spectral density can be made somewhat clearer by thinking interms of average power, although this is a fairly specialized way of looking at it. If X(t) is avoltage or current associated with a I Q resistance, then X2isjust the average power dissipatedin that resistance. The spectral density, Sx(w), can then be interpreted as the average powerassociated with a bandwidth of I Hz centered at w/2rr Hz. [Note that the unit of bandwidth isthe hertz (or cycle per second) and not the radian per second,.because ofthe factor of I/(2rr) in.the integral of (7-12).] Because the relationship of the spectral density to the average power ofthe random process is often the one of interest, the spectral density is frequently designated asthe "power density spectrum."The spectral density defined above is sometimes referredtoas the "two-sidedspectral density"since it exists for both positive and negative values of w. Some authors prefer to qefine a "onesided spectral density," which exists only for positive values off If this one-sided spectraldensity is designated by Gx (f), then the mean-square value of the random p1:ucess is given byX2= fo()(JGx (f) dfSince the one�sided spectral density is defined for positive frequencies only, it may be relatedto the two-sided spectral density byGx (f) = 2Sx (f)= 0f � Of < 0Both the one-sided spectral density and the two-sided spectral density are commonly used inthe engineering literature. The reader is cautioned that other references may use either and it isessential to be aware of the definition-being employed._The foregoing analysis of spectral density has been carried out in somewhat more detail thanis customary in an introductory discussion. The reason for this is an attempt to avoid some ofthemathematical pitfalls that a more superficial approach might gloss over. There is no doubt thatthis method makes the initial study of spectral density more difficult for the reader, but it is feltthat the additional rigor is well worth the effort. Furthermore, even if all of the implications of·the discussion are not fully understood, it should serve to make the reader aware ofthe existenceof some of the less obvious difficulties of frequency-domain methods.Another approach to spectral density, which treats it as a defined quantity based on theautocorrelation function, is given in Section 7-6. From the standpoint of application, such adefinition is probably more useful than the more basic approach given here and is also easierto understand. It does not, however, make the physical interpretation as apparent as the basicderivation does.Before turning to a more detailed discussion ofthe properties of spectral densities, it may benoted that in system analysis the spectral density ofthe input random process will play the same
- 274. 7 - 3 PROPERTIES OF SPECTRAL DENSITY 263.role as does the transform ofthe input in the case of nonrandom signals. The major differenceis that spectral density represents a powerdensity rather than a voltagedensity. Thus, it willbe necessary to define a powertransferfunctionfor the system rather than a voltagetransferfunction.Exercise 7-2.1A stationary random process has a two-sided spectr�I density given bySx(f)= 10= 0a< lfl <belsewherea} Find the mean-square value of the process if a = 4 and b :::::: 5.b} Find the mean-square value of the process if a = 0 and b = 5.Answers: 1 00, 20Exercise 7-2.2A stationary random process has a two-sided spectral density given by24Sx(w)=w2 + 16V2/Hza} Fina lhe mean-square value of the process.b} Find the mean-square value of the process in the frequency band of ±1 Hz centered at the origin.Answers: 3 V2, 1 .91 7 v27-3 Properties of Spectral DensityMost of the important properties of spectral density are summarized by the simple statementthat it is a real,positive,evenfunction of w. It is known from the study of Fourier transformsthat their magnitudeis certainly real and positive. Hence, the expected value will also possessthe same properties.
- 275. 264 CHAPTER 7 · SPECTRAL DENSITYA special class of spectral densities, which iS more commonly used than any other, is said tobe rational,since it is composed of a ratio ofpolynomials. Since the spectral density is an evenfunction of w, these polynomials involve only even powers of w. Thus, it is represented bySo(w2n +a2n-2w2n-2 +· · · +a2w2 +ao)Sx(w) = -----...,.--------w2m +b2m-2w2m-2 + . . . +b2w2 +bo(7-14)If the mean�square value of the random process is finite, then the area under Sx(w) must also ·be finite, from (7-12). In this case, i.t is necessary that m > n.This condition will always beassumed here except for a very special case of whitenoise.White noise is a term applied to arandom process forwhich the spectral density is constantforall w; thatis, Sx(w) = So. Althoughsuch aprocess cannot exist physically (since it has infinite mean-square value), it is a convenientmathematical fiction, which greatly simplifies many computations thatwould otherwise be verydifficult. The justification and illustration of the use of this concept are discussed in moredetail later.As an example of a rational spectral density consider the function16(w4 + 12w2 + 32)Sx(w) = (v6 +18w4 + 92w2 + 12016(w2 + 4)(w2 +8)= �������-,-�-(w2 + 2)(w2 +6)(w2 +10)Note that this function satisfies all of the requirements that spectral densities be real, positive,and even functions of w. In addition, the denominator is ofhigher degree than the numerator sothat the spectral density vanishes atw =oo. Thus, theprocess described by this spectral densitywill have a finite mean-square value. The factored form ofthe spectral density is often useful inevaluating the integral required to obtain the mean-square value of the process. This operationis discussed in more detail in a subsequent section.It is also possible to have spectral densities that are not rational. A typical example of this isthe spectral densityS )(sin 5�)2x(w = --5wAs is seen later, this is the spectral density of a random binary signal.Spectral densities of the type discussed above are continuous and, as such, cannot representrandom processes having de or periodic components. The reason is not difficult to understandwhen spectral density is interpreted as average power per unit bandwidth. Any de component ina random process represents a finite average power in zerobandwidth, since this component hasa discrete frequency spectrum. Finite power in zero bandwidth is equivalent to an infinite powerdensity. Hence, we would expect the spectral density in this case to be infinite at zero frequencybut finite elsewhere; that is, it would contain .a 8 function at w = 0. A similar argument forperiodic components would justify the existence of 8 functions at these discrete frequencies.A rigorous derivation of these results will serve to make the argument more precise and, at
- 276. 7-3 PROPE.RTIE.S OF SPE.CTRAL DE.NSITY 265the same time, illustrate the use of the defining equation, (7-10), in the calculation .of spectraldensities.To carry out the desired derivation, consider a stationary random process having samplefunctions of the formX(t) = A + B cos (27rJot + 8) (7-15)where A, B, and Jo are constants and 8 is a random variable uniformly distributed between 0and 27r. The Fourier transform of the truncated sample function, Xr (t), isFx(f) =1_:[A + B cos (27rJot + 8)e-j27rft dtIt is evident that Fx(f) is the Fourier transform ofthe function given in (7-15) multiplied by arectangular pulse of unit amplitude and duration 2T. Therefore the transform of this product isthe convolution ofthe transforms of the two functions, i.e.,Fx(f) = C/:l {rect (2�)[A + B cos (27rJot + 8]}= 2T sinc(2Tf) * [Ao(/) + !Bo(f + Jo)e-j9 + !Bo(f - Jo)ej9].=2AT sinc(2Tf) + BT{sinc[2T(f + Jo)]e-j9 + sinc[2T(f - Jo)]ej9} (7-16)The square of the magnitude of Fx(f) will have nine terms, some of which are independentof the random variable 8 and the rest of which involve either e±jo or e±jW. In anticipationof the result that the expectation of all terms involving 8 will vanish, it is convenient to writethe squared magnitude in symbolic form without bothering to determine all of the coefficients,Thus,1Fx(f)l2 = 4A2T2 sinc2 (Tf) + B2T2{sinc2 [2T(f + Jo)] + sinc2 [2T(f - Jo)]}+ C(j)-j9 + C(-f)ej9 + D(f)e-j29 + D(-f)j29 (7-17)Now consider the expected value of any term involving 8. These are all of the form G(f)ejno,and the expected value isI 12,,. 1 G(f) ejnn 121E{G(f)ejnO}. = G(f) -ejnO dB=---.- =00 27r 27r Jn 0n= ±1, ±2, . . . (7-18)Thus the last terms in (7-17) will vanish and the expected value will becomeE{1Fx(f)l2} = 4A2T2 sinc2 (Tf) + B2T2{sinc2 [2T(f + Jo)] + sinc2 [2T(f - Jo)]} (7-19)From (7-10), the spectral density is
- 277. 266 C H A PT E R 7 • S P E CT RA L D E N S ITYSx(f) = lim {4A2T2sinc2 (2Tf) + B2T2[sinc2 [2T(f + /o)] + sinc2 [2T(f - /o)]}T-+oo .To investigate the limit, consider the essential part of the first term; that is,1. {2 . . 2 2 /)} l 2sin2 (2rr Tf)im T smc ( T = tm T(2 T/)2T-+oo T-+oo 7r(7-20)Clearly this is zero for f =f: 0 since sin2 (2Tf) cannot exceed unity and the denominatorincreases with T. However, when f = 0, sinc2 (0) = 1 and the limit is oo. Hence, one can writelim 2T sinc2 (2Tf) = KS(f)T-+oo(7-21)where K represents the area of the delta function and has yet to be evaluated. The value of Kcan be found by equating the areas on both sides of equation (7-21).limloo2Tsin2(2rr Tf)df = loo KS(f) dfT-+oo -oo (2rr Tf)2 -ooFrom the tabulated integralloosin2 (at)dt = la l rrt2 .-00It follows that the left-hand side of (7-22) has a value of unity and therefore K = 1.A·similar procedure can be used for the other terms in (7-20). It is left as an exercise for thereader to show that the final result becomes. B2Sx (f) = A28(f) + 4 [S(f + /o) + S(f - /oH (7-22)This spectral density is shown ·in Fig�re 7-1. Note that the power is concentrated at thefrequencies of the discrete components, i.e., f = 0 and f = ±fo and that the phase of the accomponents is not involved. In terms of the variable w the expression for the spectral density isHgure 7-1 Spectral density of de and sinusoidalcomponents.9 240Sx(f)924
- 278. 7 - 3 PROPERTIES OF SPECTRAL DENS ITY 267(7-23)It is ofinterestto determine the are� ofthe spectral density in orderto verify that these equationsdo, in fact, lead to the proper mean square value. Thus, according to (7--13)x2 = /_:{A2o(f) + �su+ Jo) +o(f - Jo)l} dJB2 B2 1= A2 + - + - = A2 + -B24 4 2It can be readily determined thatthis is the same result that would be obtainedfrom the ensembleaverage of X2(t).A numerical example will serve to illustrate discrete spectral densities. Assume that astationary random process has sample functions of the formX(t) = 5 + 10 sin (12rrt +81) + �· cos (24rrt +82)in which 81 and 82are independent random variables and both are uniformly distributedbetween0 and 2rr. Note that because the phases are uniformly distributed over 2rr radians, there is nodifference between sine terms and cosine terms and both can be handled with the results justdiscussed. Thiswould not be true ifthe distribution ofphases was not uniform overthis range.Using (7-22),the spectral density of this process can be written immediately asSx(f) = 258(/) +25[8(/ +6) +8(f - 6)] + 16[8(/ + 12) +8(f - 12)]The mean-square value of this process can be obtained by inspection and is given byx2 = 25 +25[1 + 1] + 16[1 + 1] = 107It is apparent from this example that finding the spectral density and mean-square value ofrandom discrete frequency components is quite simple and straightforward.It is also possible to have spectral densities with both a continuous component and discretecomponents. An example of this sort that arises frequently in connection with communicationsystems or sampled data control systems is the random amplitude pulse sequence shown inFigure 7-2. It is assumed here that all of the pulses have the same shape, but their amplitudesare random variables th�t are.statistically_independent from pulse to pulse. However, all theamplitude variables have the same mean, Y, and the same variance, o}. The repetition periodfor the pulses is t1, a constant, and the reference time for any sample function is t0, which is arandom variable uniformly distributed over an interval of t1•The complete derivation ofthe spectral density is too lengthy to be included here, but the finalresult indicates some interesting points. This.result may be expressed in terms of the Fouriertransform F(f) of the basic pulse shape J(t), and is
- 279. 268 C H A PT E R 7 • S P E CTRA L D E N S ITYSx<f) =IF<f)l2 _r + -2- L8 ! - -[u2 (Y)2 00 ( n)]t1 ti -oot1In terms of the variable w this equation becomes2 [er; 2Jr(Y)2 � ( 2Jrn)]Sx(w) =IF(w)I -+ -- L...,, 8 w - -t1 tf n=-oot1(7-24)(7-25)If the basic pulse shape is rectangular, with a width of t2, the corresponding spectral density willbe as shown in Figure 7-3. From (7-25) the following general conclusions are possible:1. Both the continuous spectrum amplitude and the areas of the 8functions are pi9portionalto the squared magnitude of the Fourier transform of the basic pulse shape.X(t)y0f(t - to )Rgure 7.,,;.z Random amplitude pulse sequence.Rgure 7-3 Spectral density for rectangular pulse sequence with random amplitudes.
- 280. 7-3 PROPERTIES OF SPECTRAL DENSITY 2692. If the mean value ofthepulse amplitude is zero, there will be no discrete spectrum eventhough the pulses-occur periodically.3. If the variance ofthe pulse amplitude is zero, there will be no continuous spectrum.Theaboveresultis illustratedby considering a sequence ofrectangularpulses havin5randomamplitudes. Let each pulse have the formP(t) = 1= 0- 0.01 :::: t :::: O.Qlelsewhereand assume thatthese are repeatedperiodically every 0.1 secondandhaveindependentrandomamplitudes that are uniformly distributed between 0 and 12. The first step is to find the Fouriertransform ofthe pulse shape. This isP(f) = <ffe {rect(�2} = 0.02sinc (0.02/)0.0Nextwe needtofindthe mean and variance oftherandom amplitudes. Since the amplitudes areuniformly distributed the mean value isY = (�) <o + 12) = 6and the variance isa} = (i12)(12 - 0)2 = 12The spectral density may nowbe obtained from (7-24) as[12 62 00 ( .n )]Sx(f) = [0.02sinc (0.02f)]2O.l +(O.l)2 n�oo8 f - O.l= sinc2 ({o) [2.4 + 72ntoo8(f - lOn)]Again it may be seen that there is a continuous part to the spectral density as wellas an infinitenumberofdiscrete frequency components.·Anotherpropertyofspectraldensitiesconcernsthederivativeoftherandomprocess. Supposethat X(t) = dX(t)/dt and that X(t) has a spectral density of Sx(w) which was defined asSx(w) = lim E[IFx(w)l2]T�oo 2TThe truncated versionofthederivatiye, XT(t), will have aFouriertransformofjwFx(w), with
- 281. 270 CHAPTER 7 · SPECTRAL DENSITYthe possible addition of two constant terms (arising from the discontinuities at ±T) that willvanish in the limit. Hence, the spectral density of the derivative becomesS ( ) 1. E[ljwFx(w)(-jw)Fx(-w)I]x w = ImT-+oo2T(7-26)It is seen, therefore, that differentiation creates a new process whose spectral density is simplyw2 times the spectral density of the original process. In this connection, it should be noted thatif Sx(w) is finite at w = 0, then Sx (w) will be zero at w= 0. Furthermore, if Sx(w) does notdrop off more rapidly than 1/w2 as w--+ oo, then Sx(w)will approach a constant at large wand the mean-square value for the derivative will be infinite. This corresponds to the case ofnondifferentiable random processes. With the frequency variablef the spectral density of thederivative of a stationary random process X(t) isExercise 7-3.1A stationary random process has a spectral density of the formSx(f)= 48(/) + 188(/ +8) + 188(/ - 8)a) List the discrete frequencies present.b) Find the mean value of the process.c) Find the variance of the process.Answers: 0, ±8, ±2, 40Exercise 7-3.2A random process consists of a sequence of rectangular pulses havinga duration of 1 ms and occurring every 5 ms. The pulse amplitudes areindependent random variables that are uniformly distributed between A andB. For each of the following sets of values for A and B, determine if thespectral density has a continuous component, discrete components, both,or neither.
- 282. 7-4 SPECTRAL DENSITY AND THE COMPLEX FREQU ENCY PL�NE 271a) A = -5, B = 5b) A = 5, 8 = 1 5c) A = 8, B = 8d) A = 0, B = 8Answers: Both, neither, discrete only, continuous only7-4 Spectral Density and the Complex Frequency PlaneIn the discussion so far, the spectral density has been expressed as a function ofthe real frequencyf or the real angular frequency lLI. However, for applications to system analysis, it is veryconvenient to express it in terms of the complex frequency s, since system transfer functionsare more convenient in this form. This change can be made very simply by replacing jlLI with s.Hence, along the jlLl-axis of the complex frequency plane, the spectral density will be the sameas that already discussed.The formal conversion to complex frequency representation is accomplished by replacing lLIby -js or lLl2 by -s2. The resulting spectral density should properly be designated as Sx(-js),but this notation is somewhat clumsy. Therefore, spectral density inthe s-planewill bedesignatedsimply as Sx(s). It is evident that Sx(s) and Sx(lLI) are somewhat different functions of theirrespective arguments, so that the notation is symbolic rather than precise as in the case of Sx(lLI)and Sx(/).For the special case of rational spectral densities, in which only even powers of lLI occur, thissubstitution is equivalent to replacing lLl2 by -s2. For example, consider the rational spectrum10(lLl2 + 5)Sx(lLI) =lLl4 + lOlLl2 + 24When expressed as a function of s, this becomes. 10(-s2 + 5)Sx(s) = Sx(-JS) =s4 - 10s2 + 24(7-27)Any spectral density can also be represented (except for a constant ofproportionality) in termsof its pole-�ro configuration in the complex frequency plane. Such a representation is oftenconvenient in carrying out certain calculations, which will be discussed in the following sections.For purposes of illustration, consider the spectral density of (7-27). This may be factored asSx(s) =. -lO(s + vs)(.v - vs)(s + 2)(s - 2)(s + .J6)(s - .J6)and the pole-zero configuration plotted as shown in Figure 7-4. This plot also illustrates theimportant point that such configurations are always symmetrical about the jlLl-axis.
- 283. 272 CHAPTER 7 • SPECTRAL DENS ITYFigure 7-4 Pole-zero configurationfor a spectral density.2jwDealing with rational spectral densities canbe greaty simplifiedby the useofspecial computerprograms that_factor polynomials, multiply polynomials, expand rational functions in partialfractions, and generate rational functions from their poles and residues. Three MATLABprograms particularly useful in this regard are poly(r}, which generates polynomial coefficientsgiven a vector r of roots; roots(a), which finds the roots of a polynomial whose coefficientsare specified by the vector a; and conv(a,b), which generates coefficients of the polynomialresulting from the product of polynomials whose coefficients are specified by the vectorsa and b. As an example, consider a stationary random process having a spectral density ofthe formw2(w2 + 25) ·Sx(w) = -------w6 - 33w4 +463w2 + 7569(7-28)That this is a valid spectral density can be established by showing that it is always positive sinceit clearly is real and even as a function of w. This can be done in various ways. In the presentcase it is necessary to show only that the denominator is never negative. This is easily done bymaking a plot of the denominator as a function of w. A simple MATLAB program using thefunction polyval that carries out this ope�ation is as follows.w = 0:.05:2plot(polyval([1 , 0, -33, 0, 463, 0, 7569), w)grid; xlabel(w); ylabel(d(w))The plot is shown in Figure 7-5 and it is evident that the denominator is always positive.Converting Sx (w) to Sx(s) gives-s2(s2 - 25)Sx(s) = ��-------s6 + 33s4 + 463s2 - 7596The zeros (roots ofthe numerator) are seen by inspection to be 0, 0, 5, and -5. The poles (rootsof the denominator) are readily found using the MATLAB commandroots([1 , 0, 33, �· 463, 0, -7569])and are 2 + j5, 2 - j5, -2 + j5, -2 - -j5, 3, -3.
- 284. 7-4 SPECTRAL DENSITY AND TH E COMPLEX FREQU ENCY PLANE 273d(w)7500 ����---����_._����--���---I0 0.5 1 1 .5 2wRgure 7-5 Plot of the denominator of equation (7-28).When the spectral density is not rational, the substitution is the same but may not be quiteas straightforward. For example, the spectral density given by (7-25) could be expressed in thecomplex frequency plane as[a; 2rr (f)2 � ( 2rrn )]Sx(s) =F(s)F(-s) -+ -- � 8 s - j-t1 tr n=-00 t1where F(s) is the Laplacetransform ofthe basic pulse shape f(t).In addition to making spectral densities more convenient for system analysis, the use ofthe complex frequency s also makes it more convenient to evaluat� mean-square values. Thisapplication is discussed in the following section.Exercise 7-4.1A stationary random process has spectral density of the form
- 285. 274 CHAPTER 7 · SPECTRAL DENSITYIO(cv2+ 25)Sx(cv)= 45 2 4CV + CV +Find the pole and zero locations for this spectral density in the complexfrequency plane.Answers: ±5, ±1 , ±2Exercise 7-4.2A stationary random process has a spectral density of the formcv2(cv2+25)Sx(cv)= -----cv6- 6w4 + 32a) Verify that this is a valid spectral density for all values of cv.b) Find the pole and zero locations for this spectral density in the complexfreqency plane.Answers: 0, ±j2, ±5, ±.J2, ±j27-5 Mean-Square Values from Spectral DensityIt was shown in the course of defining the spectral density that the mean-square value of therandom process was given byX2= - Sx(cv)dcv1 1002Jr -ooHence, the mean-square value is proportional to the areaof the spectral density.(7-12)The evaluation of an integral such as (7-12) may be very difficult if the spectral density hasa complicated form or if it involves high powers of cv.A classical way of carrying out suchintegration is to convert the variable of integration to a complex.variable (by substituting s forjcv)and then to utilize some powerful theorems concerning integration around closed paths inthe complex plane. This is probably the easiest and most satisfactory way of obtaining meansquare values but, unfortunately, requires a knowledge of complex variables that the reader maynot possess. The mechanicsof the procedure is discussed at the end of this section, however,for those interested in this method.An alternative method, which will be discussed first, is to utilize some tabulated results forspectral densities that are rational. These have been tabulated in general form for polynomialsof various degrees and their use is simply a matter of substituting in the appropriate numbers.
- 286. 7-5 MEAN -SQU A RE VA L U ES F ROM S PECTRA L D E N S ITY 275The existence of such general forms is primarily a consequence of the symmetry ofthe spectraldensity. As a result of this symmetry, it is always possible to factor rational spectral densitiesinto the formS (c(s) c(-s)x s) = ---d(s) d(-s)(7-29)where c(s) contains the left-half-plane (lhp) zeros, c(-s)_ the right-half-plane (rhp) zeros, d(s)the lhp poles, and d(-s) the rhp poles.When the real integration of (7-12) is expressed in terms of the complex variable s, themean-square value becomes·x2 = _1_jiooSx (s) ds =� 1· joo c(s) c(-s)ds27rj -joo 27rJ -joo d(s) d(-s)(7-30)For the special case of rational spectral densities, c(s) and d(s) are polynomials in s and maybe written asc(s) = C�-isn-l + Cn-2Sn-2 + · · · + Cod(s) = dnsn + dn-lSn-l + · · · + doSome of the coefficients of c(s) may be zero, but d (s) must be of higher degree than c(s) andmust not have any coefficients missing.Integrals ofthe form in (7-30) have been tabulated for values of n up to 10, although beyondn = 3 or 4 the general results are so complicated as to be of doubtful value. An abbreviatedtable is given in Table 7-1 .As an example of this calculation, consider the spectral densitycii + 4Sx (w)= w4 + 10w2 + 9When w is replaced by -js, this becomes- (s2 - 4) -(s2 - 4)Sx (s)= s4 - IOs2 + 9=(s2 - l) (s2 - 9)This can be factored intofrom which it is seen thatSx (s)=. (s + 2) (-s + 2)(s + l) (s + 3) (-s + 1) (-s + 3)c(s) = s + 2(7-31)(7-32)
- 287. 276 CHAPTER 7 · SPECTRAL DENSITYTable 7-1 Table of Integrals1 !j00 c(s)c(-s) dI = - sn 2nj -joo d(s)d(-s)( ) n-1 n-2+ +C S =Cn-JS + Cn-2S · ·Coc2Ii = _o_2dod1d(s) = (s+ l)(s+3) =s2+4s+3This is a case in which n =2 andCt = 1Co = 2d2 = 1di = 4do = 3From Table 7-1, h is given bycfdo+c�d2 (1)2(3) +(2)2(1) 3 + 4 7h = = = -- =However, X2 = fi, so that2d0did2 2(3)(4)(1) 24 24- 7x2 = ...:.._24The procedure just presented is � mechanical one and in order to be a useful tool does notrequire any deep understanding of the theory. Some precautions are necessary, however. In thefirst place, as noted above, it is necessary that c(s) be of lower degree than d(s). Second, it is
- 288. 7-5 MEAN-SQUARE VAL U ES FROM SPECTRAL DENSITY 277necessary that c(s) and d(s) have roots onlyin the left half plane. Finally, it is necessary thatd(s) have noroots onthe jw-axis. When d(s) has roots on the jw-axis the mean-square valueis infinite and the integral of Sx (s) is undefined.In the example given above the spectral density is rational and, hence, does not contain any8 functions. Thus, the random process that it represents has a mean value ofzero and the meansquare value that was calculated is also the variance of the process. There may be situations,however, in whichthecontinuous partofthespectral density is rational, butthere are also discretecomponents resulting from a nonzero mean value or from periodic components. in cases suchas this, it is necessary to treat the continuous portion of the spectral density and the discreteportions ofthe spectral density separately when finding the mean-square value. An example willserve to illustrate the technique. Consider a spectral density of the form25 (w2 + 1 6)Sx (w) = 8n8(w) + 36n8 (w - 1 6) + 36n 8 (w + 1 6) + 4 2w + 34w + 225From the discussion in Section 7-3 and equation (7-24), it is clear that the contribution to themean-square value from the discrete components is simplyX2discrete = (-1) (8rr + 36rr + 36rr) = 402n ·Note that this includes a mean value of ±2. The continuous portion of the spectral density maynow be written as a function of s as25 (-s2 + 1 6)Sxc<s) =s_4___3_4_s2_+_2_2_5which, in factored form bec9mes[5 (s + 4)] [5 (-s + 4)]SxcCs) =[(s + 3) (s +5)] [(-s + 3) (-s + 5)]It is now clear thatc(s) = 5 (s + 4) = 5s + 20from which co = 20 and c1 = 5. Alsod(s) = (s + 3) (s + 5) = s2 + 8s + 1 5from which do = 1 5 , d1 = 8 , and d1 = 1 . Using the expression for Ii in Table 7-1 yieldsx2 = (5)2 ( 15) + (20)2 (1 )= 3 .229cont.2 ( 1 5) (8)( 1 )Hence, the total mean-square value of this process is
- 289. 278 CHAPTER 7 • SPECTRA L DENSITYx2 = 40 + 3.229 = 43.229Since the mean value of the process is ±2, the variance of the process becomes ui =43.229 - (2)2 = 39.229.It was noted previously that . the use of complex inte�ration provides a very general, andvery powerful, method of evaluating integrals of the foirn given in (7-30). A brief summaryof the theory of such integration is given in AppendlxI, and these ideas will be utilized hereto demonstrate another method of evaluating mean-square values from spectral density. As ameans of acquainting the student with the potential usefulness of this general procedure, onlythe mechanics of this method are discussed. The student should be aware, however, that thereare many pitfalls associated with using mathematical tools without having a thorough grasp oftheir theory. All students are encouraged to acquire the proper theoretical understanding as soonas possible.The methodconsideredhere is based on theevaluationofresidues, in much the same way as isdone in connection with finding inverse Laplacetransforms. Consider, for example, the spectraldensity given above in (7-31) and (7-32). This spectral density may be represented by the polezero configuration shown in Figure 7-6. The path of integration called for by (7-30) is alongthe jw-axis, but the methods of complex integration discussed in Appendix I require a closedpath. Such a closed path can be obtained by adding a semicircle at infinity that encloses eitherthe left-halfplane or the right-half plane. Less difficulty with the algebraic signs is encounteredif the left-half plane is used, so the path shown in Figure 7-7 will be assumed from now on.For the integral around this closed path to be the same as the integral along the jw-axis, it isnecessary for the contribution due to the semicircle to vanish as R -+ oo. For rational spectraldensities this will be true whenever the denominator polynomial is of higher degree than thenumerator polynomial (since only even powers are present).A basic result of complex variable theory states that the value of an integral around a simpleclosed contour in the complex plane is equal to 2rrj times the sum of the residues at the polescontained within thatcontour (see (I-3), Appendix I). Since the expression for the mean-squarevalue has a factor of 1/(2rrj), and since the chosen contour completely encloses the left-halfplane, it follows that the mean-square value can be expressed in general asX1 = :E (residues at lhp poles) (7-33)For the example being considered, the only lhp poles are at - 1 and -3. The residues can beFigure 7-6 Pole-�ro configuration fora spectral density.- 3 - 2 - 1jw1s - plane2 3
- 290. I"I7 - 5 MEAN-SQUARE VALUES FROM SPECTRAL DENSITY 279jw!,-3 -2 -11 2 3figure 7-7 Path of integration for evaluating mean-squarevalue....... _evaluated easily by multiplying Sx(s) by the factor containing the pole in question and lettingsassume the value of the pole. Thus,[ -(s +2)(s - 2)J3K-1 = [(s + l)Sx(s)Js=-1 = (s - l)(s +3)(s - 3) s=-1= 16. [ -� +�� - �JK_3 = [(s +3)Sx(s)]s=-3 = (s + l)(s _ l)(s _ 3) s=-3From (7-33) it follows that-. 3 5 7X2 = - + - = -16 48 24which is the same value obtained above.548If the poles are not simple, the more general procedures discussed in Appendix I may beemployed for evaluating the residues. However, the mean-square value is still obtained from(7-33).The computer is a powerful tool for computing the mean-square value of a process fromits spectral density. It provides several different approaches to this calculation; the two mostobvious are direct integration and summing the residues at the poles in the left-hand plane. Thefollowing exaniple illustrates these procedures and other examples are given In Appendix G.Let the spectral density be the same as considered previously, i.e.,(J)2 +4Sx(w) =w4 + 10w2 + 9The mean square value is given by-s2 + 4Sx(s) = s4 + 10s2 + 9x2 = - dw- 1 1"° (J)2 +4rr o w4 + 1Ow2 +9By making a MATLAB function that gives values of Sx(w) when called the integral canbe evaluated using the comniands quad(function,a,b) or quad8(function,a,b) where the
- 291. 280 CHAPTER 7 • SPECTRAL DENSITYfunction is Sx (w) and a and b are the limits of integration. Limits must be chosen such that no·significant area under the function is missed. In the present case an upper limit of 1000 will beused as Sx(w} is negligibly small for that value of w. The M-file to calculate the Sx(w) functionis as follows.%Sx.mfunction y=Sx(1rv)a=[1 ,0,4]; b=[1 ,0,1 0,0,9];y=polyval(a,w)./polyval(b,w);The integration is then carried out with the commandP=quad8(Sx, 0, 1 000)and the result is X2 = 0.2913.Alternatively the residues method can be used. The residues and poles of Sx (s) can be foundusing the MATLAB command [r, p, k] = residue(b, a) where b and a are the coefficients of thenumerator and denominator polynomials; r is a vector of residues corresponding to the vectorof poles, p; and k is the constant term that results if the denominator is not of higher order thanthe numerator. For the present example the result is[r,p,k]=residue([-1 ,0,4],[1 ,0,-1 0,0,9])r =-0.1 0420.1 042-0. 1 8750.1 875P =3.0000-3.00001 .0000-1 .0000k =[ ]The mean-square value is found as the sum of residues at the left-half plane poles and is0. 1024 +0. 1875 = 0.2917.
- 292. 7�6 TH E AUTOCORRELATION FU NCTIONExercise 7-5.1A stationary random process has a spectral density of25w2Sx (w) = -----w4 + :Oui2+ 9a) Find the mean-square value of this process using the results in Table7-1 .b) Repeat ·using contour integration.Answer: 25/8Exercise 7-5.2Find the mean-square value of the random process of Exercise 7-5.1 usingnumerical integration.Answer: 3.1 1 70 [using (1/pi)*quad8(spec,0, 1 000)]7-6 Relation of Spectral Density to the AutocorrelationFunction281The autocorrelation function is shown in Chapter 6 to be the expected value ofthe product oftimefunctions. In this chapter, it has been shown that the spectral density is related to the expectedvalue ofthe product of Fourier transforms. It would appear, therefore, that there should be somedirect relationship between these two expected values. Almost intuitively one would expect thespectral density to be the Fourier (or Laplace) transform ofthe autocorrelation function, and thisturns out to be the case.We consider first the case of a nonstationary random process and then specia1ize the result toa stationary process. In (1-10)the spectral density was defined asS ( ) l. E[1 Fx (w) l2JX CV = ImlT--+00 Twhere Fx (w) ii. the Fourier transform of the truncated sample function. Thus,(7-10)
- 293. 281 CHAPTER 7 • SPECTAA. L ; DENSITYT « oo (7-34)Substituting (7-34) into (7-10) yields(7-35)since 1 Fx (w) l2 = Fx (w) Fx (-w). The subscripts on t1 and t1have been introduced so thatwe can distinguish the variables of integration when the productof integrals is rewritten as aniterateddoubleintegral.Thus, write (7-35) as(7-36)Moving the expectation operation inside the double integral can be shown to be valid in thiscase, but the details are not discussed here.The expectation in the integrand above is recognized as the autocorrelation function of thetruncated process. Thus,elsewhereMaldng the substitutionwe can write (7-37) as1 1T-t, 1T .Sx(w) = lim - d-c e-JwTRx(t1, t1 + -c)dt1T-+oo2T -T-11 -T(7-37)when the limits on t1are imposed by (7--37).Interchanging the orderof integration and movingthe limit inside the -c-integral gives ·Sx (w) = f00{ lim -1-fTRx(t1 , t1,+ -c) dt1}e·-jwTd-c_00 T-+oo2T -T (7-38)From (7-38) it is apparent that the spectral density is the Fourier transform of the time averageof the autocorrelation function. This may be expressed in shorter notation as follows:
- 294. 7-6 TH E AUTOCORRELATION FUNCTION 283Sx (w) = :!J{ (Rx (t, t + r))} (7-39)The relationship given in (7-39) is valid for nonstationary processes also. .If the process in question is a stationary random process, the autocorrelation function isindependent of time; therefore,(Rx (t1 , t1 + r)} = Rx (r)Accordingly, the spectral density of a wide-sense stationary random process is just the Fouriertransform of the autocorrelation function; that is,Sx (w) = 1_:Rx (r)e-jwr dr(7-40)= ;;Ji{Rx (r)}The relationship in (7-40),··which is known as the Wier. !r-Khinchine relation, is of fundamental importance in analyzing random signals because it provides the link between thetime domain (correlation function) and the frequency domain (spectral density). Because ofthe uniqueness of the Fourier transform it follows that the autocorrelation function of a widesense stationary random process is the inverse transform of the spectral density. · In the caseof a nonstationary process, the autocorrelation function cannot be recovered from the spectraldensity-only the time average of the correlation function, as seen from (7-39). In subsequentdiscussions, we will deal only with wide-sense stationary random processes for which (7-40)is valid.As a simple example of this result, consider an autocorrelation function of the formRx (r) = Ae-Pl<I A > 0, f3 > 0The absolute value sign on r is required by the symmetry of the autocorrelation function. Thisfunction is shown in Figure 7-8 (a) and is seen to have a discontinuous derivative at r = 0.Hence, it is necessary to write (7-40) as the sum of two integrals-one for negative values of rand one for positive values of r . Thus,Sx (w) = f0 AeP<e-jw dr + f00Ae-P e-jwr dr1-oo Joe<P-jw)< lo e-<P-jw)< loo=A + A ----{J -jw -oo - ({J + jw) or I 1J2A{J= A ltJ - jw+f3 + jw = ·wz + fJ2This spectral density is shown in Figure 7-8(b).(7-41)
- 295. 2840(a)CHAPTER 7 • SPECTRA L DENSITY- {J 0(b)Figure 7-8 Relation between (a) autocorrelation function and (b) spectral density.In the stationary case it is also possible to find the autocorrefotion function corresponding toa given spectral density by using the inverse Fourier transform. Thus, Rx (T) = - Sx (w)ejwr dwI1002rr _00An example of the application ofthis result will be given in the next section.(7-42)In obtaining the result in (7--41 ), the integral was separated into two parts because of thediscontinuous slope at the origin. An alternative procedure, which is possible in all cases, is totake advantage of the symmetry of autocorrelation functions. Thus, if (7--40) is written asSx (w) = 1_:Rx (T) [cos WT - j sin wT] dTby expressing the exponential in terms of sines and cosines, it may be noted that Rx (T) sinWTis an odd function of T and, hence, will integrate to zero. On the other hand, Rx (T)coswr iseven, and the integral from -oo to +oo is just twice the integral from 0 to +oo. Hence,Sx (w) = 2 fo00Rx(r) cos WT dT (7-43)is an alternative form that does not require integrating over the origin. The correspondinginversion formula, for wide-sense stationary processes, is easily shown to bel100Rx (T) = - Sx (w) cos WT dwrr o(7--44)It was noted earlier that the relationship between spectral density and correlation functioncan also b� expressed in terms of the Laplace transform. However, it should be recalled that theform ofthe Laplace transform used most often in system analysis requires that the time functionbeing transformed be zero for negative values of time. Autocorrelation functions can never be
- 296. 7-6 TH E AUTOCORRELATION FUNCTION 285zero for negative values of r since they are always even functions of r . Hence, it is necessaryto use the two-sidedLaplace transformfor this application. The corresponding transform pairmay be written asSx(s) = 1_:Rx(r)e-srdr (7-45)and1 jjooRx(r) = .-. Sx(s)esr ds2n1 -joo(7-46)Since the spectral density of a process having a finite mean-square value can have nopoles onthe jw-axis, the path of integration in (7-46) can always be on the jw-axis.The direct two-sided Laplace transform, which yields the spectral density from the autocorrelation function, is no different from the ordinary one-sided Laplace transform and does notrequire any special comment. However, the inverse two-sided Laplace transform does require alittle more care so that a simple example of this operation is desirable.Consider the spectral density found in (7-41) and write it as a function of s asSx(s) _ - 2Af3 _ - 2Af3- s2 - w2 - (s + f3)(s - {3)in which there is one pole in the left-half plane and one pole in the right-half plane. Becauseof the symmetry of spectral densities, there will always be as many rhp poles as there are lhp ·poles. A partial fraction expansion of the above expression yieldsA ASx(s) = -- - --s + {3 s - {3The inverse Laplace transform of the lhp terms in any partial fraction expansion is usuallyinterpreted to represent a time function that exists in positive time only. Hence, in this .case wecan interpret the inverse transform of the above function to beA-- {:} Ae-.llrs + /3r > OBecause we are dealing with an autocorrelation function here it is possible to use the propertythat such functions are even in order to obtain the value of the autocorrelation function fornegative values of r. However, it is useful to discuss a more general technique that can alsobe used for crosscorrelation functions, in which this type of symmetry does not exist. Thus,for those factors in the partial fraction expansion that come from rhp poles, it is alwayspossible to (1) replace s by -s, (2) find the single-sided inverse Laplace transform of whatis now an lhp function, and (3) replace r by - r . Using this procedure on the rhp factorabove yields
- 297. CHAPTER 7 S-PECTRAL 0-E. NS ITY-A A Rr--- = -- <:} Ae-,,-s - fJ s + fJUpon replacing r by -r yields-A-- <:} AePr ! < 0s - fJThus, the resulting autocorrelation function is-- 00 < ! < 00which is exactly the autocorrelation function we started with. The technique illustrated by thisexample is sufficiently general to handle transformations from spectral densities to autocorrelatiori functions as well as from cross-spectral densities (which are discussed in a subsequentsection) to crosscorrelation functions.Exercise 7�.1A stationary random process has an autocorrelation function of the formFind the spectr.at densify of thiS prdcess,;10w2 +4bAnswer: Sx (w) = 42. .w + 17w + 16Exercise 7-6.2A stationary random process has a spectral density of the form8w2 +224Sx (w) = .w4· + 20w2 +64Find the autocorrelation function of this process.Answer: Rx (r) = 4e-21rl �e-41tl
- 298. 7-7 WHJH. ,NOISE ..2877-7 White NoiseThe concept ofwhitenoisewas mentioned previously. This term is applied to a spectral densitythatis constantforall values off;thatis, Sx (f) = So. It is interesting todeterminethecorrelationfunction for such a process. This is best done by giving the result and.verifying its correctness.Consider ail autocorrelation function that is a 8-function ofthe formRx (r) == So8 (r)Using this form in (7-40) leads toSx (f) = I:Rx (r)e-jZrrfr dr = 1_:So8 (r)e-jZrrfr dr = So (7-47)which is the result for white noise. It is clear, therefore, that the autocorrelation function forwhite noise is just a 8 function with anareaequaltothespectraldensity.It was noted ptevioisfjdhat the concept of white noise is fictitious because such a processwtmld have ah itifinite mean-square value, since the area ofthe spectral density is infinite. ThisSattle concltl"sibrl i� also apparent from the correlation function. It may be recalledthatthe meansquare value is equal to the value of the autocorrelation function at r = 0. For a 8 function atthe origin, this is also infinite. Nevertheless, the White-noise concept is an extremely valuableone in the analysis of linear systems. It frequently turns out that the random signal input to asystem has a bandwidth that is mµc:h, greater than the range of (requenc�es that the ,system iscapable ofpassing. Under these circµmstance�, assuming W,e input sp�ctral density to be whitemaygreatlysimplify thecomputation qfthe system response without intro_ducing any significanterror. Ex;:J.mples ofthis sort �e discussed in Chapters 8 and 9.Another concept that is frequently used is that of bandlimitedwhitenoise. This implies aspectral density

Be the first to comment