- 1. INTENSE PULSED LIGHT AND ROSACEA
- 2. Rosacea is a chronic acneiform disorder affecting both the skin and eye, it is a syndrome of undetermined etiology characterized by both vascular and papulopustular component involving the face and occasionally the neck and upper trunk.
- 3. Presentation – Facial erythema. – Telangectasia. – Papules. – Pustules. – Sebaceous gland hypertrophy (Rhinophyma). – Ocular lesions.
- 5. IPL is an effective treatment for the signs and symptoms of rosacea and represents a new category of therapeutic options for the rosacea patients.
- 7. Flash lamp emit every wave length of light in the visible spectrum and a little into the band of IR radiation up to 1200 nm.
- 9. Pulsed dye laser (Disadv.) –Small spot size. –Purpura. Time consuming. Period of down time.
- 10. Why IPL Wave length in not fixed (wide range). Large spot size (ttt of entire face). Purpura is uncommon (no down time procedure). Minimal risk of scarring.
- 11. The light energy passes through the epidermis and is absorbed by oxy hge. by selective photo-thermolysis.
- 12. Before IPL After
- 13. Before IPL After
- 14. Before IPL After
- 15. Before IPL After
- 17. Fungal Infections General Properties of Fungi
- 18. Structural-functional relationships The fungi are more evolutionarily advanced forms of microorganisms. Fungi can be divided into two basic morphological forms, yeasts and hyphae.
- 19. Yeasts are unicellular fungi which reproduce asexually by blastoconidia formation (budding) or fission. Hyphae are multi-cellular fungi which reproduce asexually and/or sexually.
- 20. Dimorphism is the condition where by a fungus can exhibit either the yeast form or the hyphal form, depending on growth conditions. Very few fungi exhibit dimorphism. Most fungi occur in the hyphae form as branching, threadlike tubular filaments.
- 21. These filamentous structures either lack cross walls (coenocytic) or have cross walls (septate) depending on the species. In some cases septate hyphae develop clamp connections at the septa which connect the hyphal elements.
- 22. A.Yeast cells reproducing by blastoconidia formation; B. Yeast dividing by fission; C. Pseudohyphal development; D. Coenocytic hyphae; E. Septate hyphae; F. Septate hyphae with clamp connections
- 23. The presence/absence of conidia and their size, shape and location are major features used in the laboratory to identify the species of fungus in clinical specimens.
- 24. Metabolism All fungi are free living, i.e., they are not obligate intracellular parasites.
- 25. For medical purposes the important aspects of fungal metabolism are: 1. The synthesis of chitin and other compounds, for use in forming the cell wall. These induce immune hypersensitivity.
- 26. 2. The synthesis of ergosterol for incorporation into the plasma membrane. This makes the plasma membrane sensitive to those antimicrobial agents which either block the synthesis of ergosterol or prevent its incorporation into the membrane or bind to it, e.g. amphotericin B.
- 27. 3. The synthesis of toxins such as Aflatoxins. - these are carcinogens produced by Aspergillus flavus when growing on grain. When these grains are eaten by humans or when they are fed to dairy cattle and they get into the milk supply, they affect humans.
- 28. 4. The synthesis of proteins on ribosomes that are different from those found in bacteria. This makes the fungi immune to those antimicrobial agents that are directed against the bacterial ribosome, e.g. chloramphenicol.
- 29. Classification of Fungal Infections 1. Superficial mycoses 2. Cutaneous mycoses 3. Subcutaneous mycoses 4. Systemic mycoses 5. Opportunistic mycoses
- 30. In general, humans have a high level of innate immunity to fungi and most of the infections they cause are mild and self- limiting.
- 31. This resistance is due to: Fatty acid content of the skin pH of the skin, mucosal surfaces and body fluids Epithelial turnover Normal flora Transferrin Cilia of respiratory tract
- 32. Response of bone to inflammation
- 33. Bone as an organ is composed of:• 1- Bone tissue: (Described in microscopic terms and defined by the relation of its collagen fibers and mineral structure to the bone cells) 2- Cartilage. 3-Fat marrow elements. 4- Vessels. 5-Nerves. 6- Fibrous tissue.
- 34. • Macroscopically two types of bone are recognized to: 1- Cortical bone: This is dense compact bone whose outer shell defines the shape of the bone. 2- Coarse cancellous bone: This is also termed: spongy, trabecular or marrow bone. Generally, found at the ends of long bones within the medullary canal (medullary space). Note: Changes in the rate of bone turnover (resorption / deposition and formation) are manifested principally in cancellous bone because the cancellous bone has a high surface-to-volume ratio which means it contains many more bone cells per unit volume than does the cortical bone.
- 37. All bones contain both cancellous and cortical elements but their proportions differ.• For example: The skull is formed by outer and inner tables of compact bone, with only a small amount of cancellous bone within the marrow space called diploë.
- 39. Periosteum: It is a specialized connective tissue that covers the outer surface of all bones and is capable of forming bone. Endosteum: It lines the bone marrow cavity and it is made of one layer of osteogenic cell.
- 40. The bone matrix It is composed of: 1- Mineralized (inorganic) phase: Mineralized matrix is composed of hydroxyapatite crystals and other ions as carbonate, citrate, fluoride, chloride, sodium, magnesium, potassium and strontium. 2- Organic matrix: Type I collagen and other proteins as: a- Osteocalcin: It is protein produced by osteoblast. Blood levels of this protein serve as a useful marker of bone formation. b- Osteopontin and Sialoprotein: They are other bone matrix proteins that help anchor cells to the bone matrix.
- 41. 3- Cells: They are Osteoprogenitor, Osteoblast, Osteocyte and Osteoclast. a- Osteoprogenitor cells: can differentiate into Osteoblasts and Osteocytes. They They give rise to osteoblasts. They are found in the inner layer of periosteum and endosteum. b- Osteoblasts: They are bone forming cells as their cytoplasm contains alkaline phosphatase which is responsible for bone deposition. They are found in the inner layer of periosteum and in endosteum.
- 42. c- Osteocyte: It is the mature bone cell i.e. It is the osteoblast which is completely embedded in bone matrix and isolated in a lacuna. Osteocytes maintain the hardness of the matrix by continuous deposition of calcium salts. d- Osteoclasts: They are bone resorptive cells. They originate from blood monocytes. They are found on the surfaces of bones in small depressions termed Howship's lacunae.
- 44. Bone marrow It resides in the space enclosed by the cortical bone termed the marrow space or the medullary canal. It is supported by a delicate connective tissue framework that includes marrow cells and blood vessels. They are three types of marrow: a- Red marrow: It corresponds to hemopoietic tissue. b- Yellow marrow: It appears microscopically as fat tissue. c- Gray or White marrow: It is deficient in hemopoietic elements and it is often fibrotic. Note: It is always a pathological tissue in a nongrowing adult bone.
- 45. The blood supply of bone marrow is through: a- Haversian canals which are spaces in the bone of the cortex that course parallel to the long axis of the bone and then they branch and communicate with other similar canals. Each canal contains one or two blood vessels, lymphatics and some nerve fibers. b- Volkmann's canals which are spaces within the cortex that run perpendicular to the long axis of the cortex to connect adjacent Haversian canals. They also contain blood vessels. Note: Each artery has its paired vein and perhaps free nerve endings. Drainage of the veins proceeds from the cortex outward to the periosteal veins.
- 46. Microscopic organization of bone tissue Microscopic examination reveals two types of bone tissues which are: 1- Lamellar bone. 2- Woven bone. Both varieties may be mineralized or unmineralized. termed osteoid.
- 47. 1- Lamellar bone It is produced slowly and it is highly organized. It forms the adult skeleton. Anything other than lamellar bone in the adult skeleton is abnormal. It is defined by three characteristics which are: 1- Type I collagen fibers have a parallel arrangement. 2- There are few osteocytes in the matrix. 3- Uniform osteocytes in lacunae parallel to the long axis of the collagen fibers.
- 48. Types of lamellar bone: There are four types of lamellar bone which are: 1- Circumferential bone: It forms the outer periosteal and the inner endosteal lamellar envelopes of the cortex. 2- Concentric lamellar bone: It is arranged around the Haversian canals. Haversian system Concentric lamellar bone + Haversian artery and vein inside haversian canal. Constitute osteon which compose the haversian system.
- 49. 3- Interstitial lamellar bone: It represents the remnants of either circumferential or concentric lamellar bone, which have been remodeled and are wedged between the osteons. 4- Trabecular lamellar bone: It forms the coarse cancellous bone of the medullary cavity. It exhibits the plates of lamellar bone perforated by marrow spaces. With the exception of trabecular bone, lamellar bone is found in the cortex.
- 51. 2- Woven bone It is defined by: 1- Type I collagen fibers have an irregular arrangement, hence the term woven. 2- There are numerous osteocytes in the matrix. 3- There is variation in the size and shape of the osteocytes. It is more rapidly deposited than lamellar bone. It is haphazardly arranged and of low tensile strength. Collagen fibers of woven bone
- 52. Sites: 1- In the areas surrounding tumors and infections. 2- As a part of the healing fracture. 3- In the developing fetus. Note: The presence of woven bone in the adult skeleton always represents a pathological condition and indicates that reactive tissue has been produced in response to some stress in the bone.
- 53. Osteomyelitis It is the inflammation of bone and bone marrow. But it is really an inflammation of the soft parts of bone which are the contents of the medullary cavity and the Haversian canals and the periosteum. The inflammation of bone may be: Osteitis: It is the inflammation of bone cortex. ex. It occurs in the case of non-specific chronic osteomyelitis. Chronic Focal Diffuse Suppurative Sclerosing Sclerosing Osteomyelitis Osteomyelitis Osteomyelitis
- 56. Periosteitis: It is the inflammation of periosteum. ex. Garre's Osteomyelitis. The inflammation of the alveolar bone may be occur as Periodontitis.
- 58. Osteomyelitis Etiology: The inflammatory lesions in bone are caused mainly by bacteria, rarely by fungi. The most common bacterial causative agents are Staphylococcus aureus, Escherichia coli, Pseudomonas and Klebsiella. They can cause the infection.
- 60. Infection Direct infection Indirect infection ex. ex. 1- By penetrating wounds. 1- Direct spread from an adjacent 2- By penetrating fractures. infection. ex. Open fracture of jaw. ex. Periapical infections as: 3- By penetrating surgery. a. Abscess e.g. infection of the jaw 4- By penetrating trauma. from a dental abscess. 5- By penetrating gunshot injuries. b. Infected granuloma. c. Infected cyst. 2- Spread of infection following extraction of an infected tooth without antibiotic coverage. 3- Hematogenous spread of bacteria from a distant focus or sepsis. 4- Invasion of bone from adjacent septic arthritis or soft tissue abscesses.
- 63. Osteomyelitis may be acute to start with and may become chronic or it is a chronic inflammation like tuberculosis from its inception. Classification: Osteomyelitis Acute Osteomyelitis Chronic Osteomyelitis Specific Chronic Nonspecific Chronic Osteomyelitis Osteomyelitis Usually due to nonspecific mixed infections
- 64. Acute Osteomyelitis Definition: It is a type of osteomyelitis. It is a boil in a bone. The calcified portion takes no active part in the process, but it suffers secondarily from the loss of blood supply and a greater or less portion may die. Acute Suppurative Osteomyelitis: It is a serious form of diffusely spreading acute inflammation of the bone that often causes extensive tissue necrosis.
- 65. Acute Osteomyelitis Histopathology: The metaphyseal area is susceptible to the acute osteomyelitis Because of the unique vascular supply in this region. Normally, arterioles enter the calcified portion of the growth plate from a loop. And then drain into the medullary cavity without establishing a capillary bed. This loop system permits slowing and sludging of blood flow. Thereby allowing bacteria time to penetrate the walls of the blood vessels and to establish an infective focus within the marrow. This initiates an acute inflammatory response with exudation of protein- rich fluid and neutrophil polymorphs.
- 66. If the organism is virulent and continues to proliferate, it increases pressure on the adjacent thin-walled vessels because they lie in a closed space which is the marrow cavity of bone. This pressure further compromises the vascular supply in this region and produces bone necrosis. By allowing further bacterial proliferation, the necrotic areas coalesce into an avascular zone. Pus and bacteria extend into the endosteal vascular channels that supply the cortex and spread throughout the Volkmann and Haversian canals of the cortex.
- 67. A sinus tract that extends from the cloaca to the skin may become epithelialized by epidermis that grows into the sinus tract, so the sinus tract invariably remains open, continually draining pus, necrotic bone and bacteria. Eventually, pus forms underneath the periosteum, shearing off the perforating arteries of the periosteum and further devitalizing the cortex. The pus flows between the periosteum and the cortex, isolating more bone from its blood supply and may even invade the joint. Eventually, the pus penetrates the periosteum and the skin forming a draining sinus.
- 68. Periosteal new bone formation and reactive bone formation in the marrow tend to wall off the infection. At the same time, the osteoclastic activity resorbs the bone. If the infection is virulent, this attempt to contain it is overwhelmed and the infection races through the bone with virtually no bone formation but rather extensive bone necrosis. More commonly, pluripotential cells modulate into osteoblasts in an attempt to wall off the infection.
- 69. Several lesions may develop. These lesions are: 1- Cloaca. 2- Sequestrum. 3- Brodie abscess. 4- Involucrum.
- 70. 1- Cloaca: It is the hole formed in the bone during the formation of a draining sinus. 2- Sequestrum: It is a fragment of necrotic bone that is embedded in the pus. It is separated from the living bone by the action of the osteoclasts.
- 73. 4- Involucrum: It is a lesion in which the periosteal new bone formation forms a sheath around the necrotic sequestrum because some of the cells of the osteogenic layer usually survive and when the acuteness of the infection is past these osteoclasts lay down new bone over the sequestrum in the form of a new case or involucrum. 3- Brodie abscess: It consists of reactive bone from the periosteum and the endosteum, which surrounds and contains the infection.
- 75. 1- Throbbing or intense local pain which is the primary feature of this inflammatory process with tenderness over the accepted region. 2- Pyrexia (high fever). 3- Some tissue swelling. 4- Rapid pulse. 5- ESR (Erythrocyte Sedimentation Rate) is almost elevated. 6- Leucocytosis (white blood count shows an increase in neutrophils). 7- Paresthesia of the lower lip with the mandible may occasionally occur. 8- Malaise. 9- Chills. Specific for indirect acute inflammation. 10- Painful lymphadenopathy. Clinical picture:
- 76. Radiographic diagnosis: 1- As the initial lesions are confined to the soft part of the bone, there are no characteristic x-ray changes in the earlier stages of the disease in the first one or two weeks. 2- So x-ray may be normal until the bone resorption takes place surrounded by a zone of sclerosis, or the medullary cavity may show increased density (diffuse radiolucency). 3- When infection extends through the cortical bone to the periosteal layer, soft tissue swelling and the periosteal elevation can be detected radiologically.
- 79. Chronic Osteomyelitis: It is the type of osteomyelitis that may occur after the acute phase or it may even develop without having any preceding acute phase. Etiology: The causative agent is usually a mixed infection. They are most commonly Streptococci and Staphylococci. Histopathology: The factors that maintain chronicity are: 1- Bone cavity which surrounded by dense sclerosis. 2- Sequestrum which acts as an irritant and harbours bacteria. 3- Bacteria are imprisoned in the fibrous tissue where they remain dormant and may be activated at any time. 4- Sinuses which lead to the skin surface favouring secondary infection.
- 82. Clinical feature: 1- History of acute osteomyelitis may be given. 2- The commonest presentatic feature (lesion) is the sinus which is discharging pus and sometimes small pieces of sequestrum but is less frequently seen in the jaw. 3- Pain and swelling of the jaw at the affected bone. 4- Atrophy of the surrounding tissues may be found. The mandible especially the molar area is more frequently affected than the maxilla.
- 84. Radiographically: It appears primarily as a radiolucent lesion that may show focal zones of opacification. The radiolucent pattern is often described a moth-eaten because of its mottled radiographic appearance. Lesions may be very extensive and margins are often indistinct.
- 85. Non-specific chronic osteomyelitis Suppurative Sclerosing Non- Suppurative Sclerosing Focal Diffuse Garre's type
- 86. 1- Suppurative Sclerosing Osteomyelitis: a- Focal type: Sometimes called Focal Sclerosing. It is osteopetrosis when associated with good picture of normal teeth. It is characterized by: Condensing osteitis (focal bony reaction) usually occurs to a low grade inflammatory stimulus of the periapical tissues. e.g. At the apex of a tooth with long standing pulpitis.
- 87. Clinical feature: 1- Affect young individuals (below 20 years of age). 2- The associated tooth is very often grossly carious non-vital mandibular first molar. 3- Bony lesion is mostly asymptomatic. 4- On rare occasions there is little pain. Radiographic picture: A sharply defined, well circumscribed and radiopaque area is observed in the jaw bone just below the root apex of the affected tooth. The lamina dura around the root is intact.
- 89. b- Diffuse type: Low grade infection or chronic and wide spread periodontal disease (periodontitis) is important in etiology and progression of diffuse sclerosing osteon, which appears to provide a portal of entry for bacteria (Carious non-vital teeth are less frequently implicated). It shows both sclerotic and osteoclastic activity. reversal lines
- 90. Clinical features: 1- It shows chronic course with acute exacerbations of pain and swelling (usually asymptomatic lesion). 2- Occasional drainage or fistula formation may occur. 3- The mandible is more commonly affected than the maxilla. Radiographic picture: 1- Diffuse process typically affecting a large part of the jaw. 2- Ill-defined lesion. 3- In early stages: lucent zones may appear in association with sclerotic masses. 4- In advanced stages: sclerosis dominates the radiographic picture. 5- Periosteal thickening may also be seen.
- 91. 2- Non-suppurative Sclerosing Osteomyelitis (Garre's type): It is characterized by: 1- A prominent periosteal inflammatory reaction (proliferative periosteitis). 2- Subperiosteal reactive new bone deposition. 3- Focal gross thickening of the involved bone. 4- There is neither sequestration nor sinus formation. It is most often from a periapical abscess of lower molar and due to infection associated with tooth extraction or partially erupted molars.
- 92. Clinical features: 1- Affect the posterior of the mandible, usually unilateral. 2- Asymptomatic bony hard swelling with normally appearance overlaying skin and mucosa. 3- On occasion, slight tenderness may be noted. Radiographic picture: X-ray shows marked thickening and increased density of the outer cortex of the jaw (Duplication of the cortex). Partial obliteration of the marrow spaces (lesion appears centrally as a mottled i.e. predominantly lucent lesion). Presence of periapical radiolucency in relation to a grossly carious tooth.
- 94. Specific chronic osteomyelitis Tuberculous Syphilitic Actinomycotic
- 95. 1- Tuberculous Osteomyelitis: Definition: Tuberculosis of bone is chronic osteomyelitis occurring in early life. Main effect: It displays an excess of bone destruction than bone formation yet with tendency toward limitation of spread and spontaneous healing.
- 96. Osteitis Periosteitis: It tends to be osteitis since it begins in spongy bone It is commonest in vertebrae, the small bones of hands and feet and the end of long bones including both metaphysis and epiphysis. Clinical pictures: It appears as epithelioid granulomas (Tubercles) with central caseous necrosis and langhans (multi-nucleated cells) with chronic unremarkable pain. Radiographic picture: Appear as small areas of translucency.
- 97. 2- Syphilitic Osteomyelitis: Congenital Acquired Main effect: It produce prominent reactive bone formation. Osteitis Periosteitis: Congenital syphilis tends to be periosteitis particularly of long bone.
- 98. Clinical picture: 1- Local swelling with pain and warmth. 2- Drainage of pus through skin. 3- Bone tenderness. Radiographic picture: It appears as radiolucent lesions.
- 99. 3- Actinomycotic Osteomyelitis: It is unfairly common because of its varied presentation. They are gram positive rods which are strict or facultative anaerobic and it is more common in mandible than maxilla. Morphologically: They are filamentous and branching in nature. A. israelli, A. bovis, A. naeslundii, A. viscous and A. odontolyticus are members of the family Actinomycetaceae. Except for A. bovis all the species are normal inhabitants of human oral cavity.
- 100. Precipitating factors leading to disease in the cervical facial region are: 1- Carious teeth. 2- Dental manipulation. 3- Maxillofacial trauma. 4- Deep wound. Its pathogenesis is related to its ability to act as an intracellular parasite and thus resist phagocytosis as well as its tendency to spread without respect for tissue plains or anatomic barriers.
- 101. The clinical findings include: Presence of sulphur granules seen as basophilic masses with granular center and radiating protrusions as well as distinctive and beaded actinomyces. Clinical picture: 1- Local mucosal trauma. 2- Soft tissue swelling. 3- Palpable mass, sometimes painful or recurrent that may be associated with drainin sinus tract with the presence of sulphur colonies.
- 102. Treatment: Four weeks of high dose intravenous Penicillin followed by three to six months course of oral Penicillin + Hydrogen peroxide. Radiographic picture: It is not conformed but there is a small punched out radiolucent areas with irregular and ill-defined margins.
- 105. What is Statistics? It is the science and practice of developing knowledge through the use of quantitative empirical data. It is based on statistical theory which is a branch of applied mathematics. Statistical theory uses probability theory to model: – Randomness – Uncertainty Statistics may be considered a branch of decision theory Statistical practice includes – Planning of observations – Summarizing observations – Interpreting observations Statistical practice allows for – Variability – uncertainty
- 106. Applied Statistics Biostatistics Business statistics Economic statistics Engineering statistics Statistical physics Demography Psychological statistics Social statistics Reliability statistics
- 107. Application Examples of Biostatistics (Statistics in Clinical Medicine) To determine the accuracy of clinical measurements To compare measurement techniques To assess diagnostic tests To determine normal values To estimate prognosis To monitor patients To evaluate bed use To calculate perinatal mortality rates
- 108. Statistics and Medical Research Statistics becomes most intimately involved in medical research In order to read the results of the enormous amount of research that pours into the medical journals, all doctors should have some understanding of the ways in which – Studies are designed – Data are collected – Data are analyzed and interpreted
- 109. What Statistics actually do Statistics do one of only 2 things: 1- they describe a set of data 2- they provide a basis for drawing generalizations about a large group when only a small portion of the larger group has been observed (measured) Category 1 is descriptive statistics Category 2 is inferential statistics 5:30 AM
- 110. Descriptive Statistics Given the IQ scores for all 8th grade boys at a certain High School. Descriptive statistics allow me to identify… the typical IQ the most frequently occurring IQ the midpoint of the range of IQ scores But I cannot interpret the scores to have any meaning or applications regarding other 8th grade boys in the same town or in other locations, or to ages and genders. This is what inferential statistics all about.
- 111. The Population and the Sample A population is any group of people, all of whom have at least on characteristic in common. A sample is a selected smaller subset of the population 5:30 AM Statistic Parameter Sample Population
- 112. Sampling To make generalizations from a sample, it needs to be representative of the larger population from which it is taken. In the ideal scientific world, the individuals for the sample would be randomly selected. This requires that each member of the population has an equal chance of being selected each time a selection is Statistic Parameter Sample Population Draw Generalizations
- 113. Basic concepts and notation N n X S Population Sample
- 115. Statistical Methods Plan observations to control their variability (experiment design) Summarize a collection of observations (descriptive statistics) Reach conclusions about what the observations can tell us (statistical inference)
- 116. Experiment Design In medical research, statistical thinking is heavily involved in the design of experiments, particularly comparative experiments where we wish to study the difference between the effects of two or more treatments. These experiments may be carried out – in the laboratory – on animals – on human volunteers – on human patients in the hospital or community In the case of preventive trials, they may be carried out on currently healthy patients
- 117. Comparison Techniques We could compare the results of the new treatment on new patients with records of previous results using the old treatment. – This is seldom convincing, because there are many differences between the patients who received the old treatment and the patients who are going to receive the old one. We could ask people to volunteer for the new treatment and give the standard treatment to those who do not volunteer. – The difficulty here is that people who volunteer and people who do not volunteer are likely to be different in many ways apart from the treatment. We can allocate patients to the new or standard treatment and observe the outcome. The way in which patients are allocated can influence the results enormously. There are two basic approaches to patient allocation: – Random – Quasi-random
- 118. Random Allocation Assume that are 20 subjects to be allocated to two groups, which we shall label A and B. Those 20 subjects are further assigned the following random number sequence: (3, 4, 6, 2, 9, 7, 5, 3, 2, 6, 9, 7, 9, 3, 7, 2, 3, 3, 2, 4) Subject Random Number Group Subject Random Number Group 1 3 A 11 9 A 2 4 B 12 7 A 3 6 B 13 9 A 4 2 B 14 3 A 5 9 A 15 7 A 6 7 A 16 2 B 7 5 A 17 3 A 8 3 A 18 3 A 9 2 B 19 2 B 10 6 B 20 4 B
- 119. Random Allocation The system above gives unequal numbers in the two groups, 12 in A and 8 in B. Sometimes it is desired that the groups be of equal size. In that case we can assign Group A to the first ten entries only in the table labeled A, and assign Group B to the remainder of the twenty subjects.
- 120. Types of Variables Qualitative Variables Attributes, categories Examples: male/female, registered to vote/not, ethnicity, eye color.... Quantitative Variables Discrete - usually take on integer values but can take on fractions when variable allows - counts, how many Continuous - can take on any value at any point along an interval - measurements, how much
- 121. Discrete Data A set of data is said to be discrete if the values / observations belonging to it are distinct and separate. They can be counted (1,2,3,.......). Examples: the number of kittens in a litter; the number of patients in a doctors surgery; the number of flaws in one metre of cloth; gender (male, female); blood group (O, A, B, AB).
- 122. Continuous Data A set of data is said to be continuous if the values / observations belonging to it may take on any value within a finite or infinite interval. You can count, order and measure continuous data. Examples: height; weight; temperature; the amount of sugar in an orange; the time required to run a mile.
- 123. Scales of Measurement Nominal Scale - Labels represent various levels of a categorical variable. Ordinal Scale - Labels represent an order that indicates either preference or ranking. Interval Scale - Numerical labels indicate order and distance between elements. There is no absolute zero and multiples of measures are not meaningful. Ratio Scale - Numerical labels indicate order and distance between elements. There is an absolute zero and multiples of measures are meaningful.
- 124. Diagrammatic Representation of Data It is often convenient to present data pictorially. Information can be conveyed much more quickly by a diagram than by a table of numbers. This is particularly useful when data are being presented to an audience. A diagram can also help the reader get the salient points of a table of numbers. Unfortunately, unless great care is taken, diagrams can be very misleading and should only serve an illustrative purpose and not a
- 125. Histogram A histogram is a way of summarizing data that are measured on an interval scale (either discrete or continuous). It is often used in exploratory data analysis to illustrate the major features of the distribution of the data in a convenient form. It divides up the range of possible values in a data set into classes or groups. For each group, a rectangle is constructed with a base length equal to the range of values in that specific group, and an area proportional to the number of observations falling into that group. This means that the rectangles might be drawn of non-uniform height.
- 126. Example The average daily cost to community hospitals for patient stays during 1993 for each of the 50 U.S. states was given in the next table. a) Construct a frequency distribution. State interval width and class mark. b) Construct a histogram, c) Construct a relative frequency distribution, d) Construct a cumulative frequency distribution.
- 127. Example –Data List AL $775 HI 823 MA 1,036 NM 1,046 SD 506 AK 1,136 ID 659 MI 902 NY 784 TN 859 AZ 1,091 IL 917 MN 652 NC 763 TX 1,010 AR 678 IN 898 MS 555 ND 507 UT 1,081 CA 1,221 IA 612 MO 863 OH 940 VT 676 CO 961 KS 666 MT 482 OK 797 VA 830 CT 1,058 KY 703 NE 626 OR 1,052 WA 1,143 DE 1,024 LA 875 NV 900 PA 861 WV 701 FL 960 ME 738 NH 976 RI 885 WI 744 GA 775 MD 889 NJ 829 SC 838 WY 537
- 128. Example – Data Array CA 1,221 TX 1,010 RI 885 NY 784 KS 666 WA 1,143 NH 976 LA 875 AL 775 ID 659 AK 1,136 CO 961 MO 863 GA 775 MN 652 AZ 1,091 FL 960 PA 861 NC 763 NE 626 UT 1,081 CH 940 TN 859 WI 744 IA 612 CT 1,058 IL 917 SC 838 ME 738 MS 555 OR 1,052 MI 902 VA 830 KY 703 WY 537 NM 1,046 NV 900 NJ 829 WV 701 ND 507 MA 1,036 IN 898 HI 823 AR 678 SD 506 DE 1,024 MD 889 OK 797 VT 676 MT 482
- 129. Example – Frequency Distribution Average daily cost Number Mark $450 – under $550 4 $500 $550 – under $650 3 $600 $650 – under $750 9 $700 $750 – under $850 9 $800 $850 – under $950 11 $900 $950 – under $1,050 7 $1,000 $1,050 – under $1,150 6 $1,100 $1,150 – under $1,250 1 $1,200 Interval width: $100
- 130. Example – Histogram 0 2 4 6 8 10 12 500 600 700 800 900 1000 1100 1200
- 131. Example – Relative Frequency Distribution (Polygon) 0 0.05 0.1 0.15 0.2 0.25 0 200 400 600 800 1000 1200 1400
- 132. Example – Cumulative Frequency Distribution Average daily cost Number Cum. Freq. $450 – under $550 4 4 $550 – under $650 3 7 $650 – under $750 9 16 $750 – under $850 9 25 $850 – under $9 11 36 $950 – under $1,050 7 43 $1,050 – under $1,150 6 49 $1,150 – under $1,250 1 50
- 133. Example – Cumulative Frequency Distribution & Cumulative Relative Frequency Distribution Average daily cost Cum.Freq. Cum.Rel.Freq. $450 – under $550 4 4/50 = .02 $550 – under $650 7 7/50 = .14 $650 – under $750 16 16/50 = .32 $750 – under $850 25 25/50 = .50 $850 – under $950 36 36/50 = .72 $950 – under $1,050 43 43/50 = .86 $1,050 – under $1,150 49 49/50 = .98 $1,150 – under $1,250 50 50/50 = 1.00
- 134. Example – Cumulative Distribution 0 5 10 15 20 25 30 35 40 45 50 0 200 400 600 800 1000 1200
- 136. Pie Charts The pie chart or pie diagram is the equivalent of the histogram for qualitative data. It shows the relative frequency for each category by dividing a circle into sectors, the angles of which are proportional to the relative frequency. We thus multiply each relative frequency by 360°, to give the corresponding angle in degrees.
- 137. Pie Chart Calculations of the Distribution of Causes of Death Angle (degrees) Relative frequency (%) FrequencyCause of death 17849.471143559Circulatory system 7821.63062767Neoplasms (cancers) 5415.12343886Respiratory system 113.1529147Digestive system 62.6667736Injury and poisoning 297.95823094Others
- 138. Major Causes of Death Pie Chart
- 139. Bar Charts Histograms and pie charts depict the distribution of a single variable. A bar chart or bar diagram shows the relationship between two variables, usually one being quantitative and the other qualitative or a grouped quantitative variable, such as time in years. The values of the first variable height are shown by the heights of bars, one bar for each category of the second variable. Bar charts can be used to represent
- 140. Annual Standardized Mortality Rate from Cancer of Esophagus (England & Wales, 1960-1969) Mortality rateYear 5.11960 5.01961 5.21962 5.21963 5.21964 5.41965 5.41966 5.61967 5.81968 5.91969
- 141. Bar chart showing the relationship between mortality due to cancer of the esophagus and year
- 143. Key Terms Measures of Central Tendency, The Center Mean µ, population; , sample Weighted Mean Median Mode x
- 144. Key Terms Measures of Dispersion, The Spread Range Mean absolute deviation Variance Standard deviation Inter-quartile range Inter-quartile deviation Coefficient of variation
- 145. Key Terms Measures of Relative Position Quantiles Quartiles Deciles Percentiles Residuals Standardized values
- 146. The Mean Mean Arithmetic average = (sum all values)/# of values Population: µ = (Sxi)/N Sample: = (Sxi)/nx
- 147. The Weighted Mean When what you have is grouped data, compute the mean using µ = (Swixi)/Swi
- 148. The Median To find the median: 1. Put the data in an array. 2A. If the data set has an ODD number of numbers, the median is the middle value. 2B. If the data set has an EVEN number of numbers, the median is the AVERAGE of the middle two values. (Note that the median of an even set of data values is not necessarily a member of the set of values.)
- 149. The Mode The mode is the most frequent value. While there is just one value for the mean and one value for the median, there may be more than one value for the mode of a data set. The mode tends to be less frequently used than the mean or the median. 0 2 4 6 8 10 12 500 600 700 800 900 1000 1100 1200
- 150. Comparing Measures of Central Tendency If mean = median = mode, the shape of the distribution is symmetric. If mode < median < mean or if mean > median > mode, the shape of the distribution trails to the right, is positively skewed. If mean < median < mode or if mode > median > mean, the shape of the distribution trails to the left, is negatively skewed.
- 151. The Range The range is the distance between the smallest and the largest data value in the set. Range = largest value – smallest value Sometimes range is reported as an interval, anchored between the smallest and largest data value, rather than the actual width of that interval.
- 152. Residuals Residuals are the differences between each data value in the set and the group mean: for a population, xi – µ for a sample, xi – x
- 153. The Variance Variance is one of the most frequently used measures of spread, for population, for sample, The right side of each equation is often used as a computational shortcut. 2 S(x i –)2 N S(x i )2 – N2 N s2 S(x i –x)2 n–1 S(x i )2–nx2 n–1
- 154. The Standard Deviation Since variance is given in squared units, we often find uses for the standard deviation, which is the square root of variance: for a population, for a sample, 2 s s2
- 155. Quartiles One of the most frequently used quantiles is the quartile. Quartiles divide the values of a data set into four subsets of equal size, each comprising 25% of the observations. To find the first, second, and third quartiles: 1. Arrange the N data values into an array. 2. First quartile, Q1 = data value at position (N + 1)/4 3. Second quartile, Q2 = data value at position 2(N + 1)/4 4. Third quartile, Q3 = data value at position 3(N + 1)/4
- 156. Quartiles 0.0 1.5 3.0 4.5 6.0 0 25 50 75 100 Ln_YarnS CumulativeFrequency Q1 Q2 Q3
- 157. Standardized Values How far above or below the individual value is compared to the population mean in units of standard deviation “How far above or below” (data value – mean) which is the residual... “In units of standard deviation” divided by Standardized individual value: A negative z means the data value falls below the mean. x– z
- 159. Probability of an Event soccurrenceofNo.Total occurcanAeventtimesofNo AEventofobability . Pr
- 160. Success = at least one “1”
- 161. Success = at least one “1,1”
- 162. P (Failure) = P(no “1”) 482. 6 5 4 P (Success) = .518 P (Failure) = P(no “1,1”) 509. 36 35 24 P (Success) = .491
- 163. To add or to multiply ? P(outcomes add up to “10”) =?
- 164. P(outcomes add up to “10”) 36/3= 1,1 1,2 1,3 1,4 1,5 1,6 2,1 2,2 2,3 2,4 2,5 2,6 3,1 3,2 3,2 3,4 3,5 3,6 4,1 4,2 4,3 4,4 4,5 4,6 5,1 5,2 5,3 5,4 5,5 5,6 6,1 6,2 6,3 6,4 6,5 6,6
- 165. Mutually Exclusive Events Two events are mutually exclusive (or disjoint) if it is impossible for them to occur together. If two events are mutually exclusive, they cannot be independent and vice versa.
- 166. Example: A subject in a study cannot be both male and female, A subject cannot be aged 20 and 30. However: A subject could be both male and 20. A subject could be both female and 30.
- 167. Independent Events Two events are independent if the occurrence of one of the events gives us no information about whether or not the other event will occur; that is, the events have no influence on each other. If two events are independent then they cannot be mutually exclusive (disjoint) and vice versa.
- 168. Example Suppose that a man and a woman each have a pack of 52 playing cards. Each draws a card from his/her pack. Find the probability that they each draw the ace of clubs. We define the events A = 'the man draws the ace of clubs' B = 'the woman draws the ace of clubs' Clearly events A and B are independent so, P(A and B) = P(A)•P(B) = (1/52)×(1/52) = 1/2704 That is, there is a very small chance that the man and the woman will both draw the ace of clubs.
- 169. Conditional Probability Suppose You go out for lunch at the same place and time every Friday. You are served lunch within 15 minutes with probability 0.9. However, given that you notice that the restaurant is exceptionally busy, the probability of being served lunch within 15 minutes may reduce to 0.7. This is the conditional probability of being served lunch within 15 minutes given that the restaurant is exceptionally busy.
- 170. The usual notation for "event A occurs given that event B has occurred" is A|B (A given B). The symbol | is a vertical line and does not imply division. P(A|B) denotes the probability that event A will occur given that event B has occurred already.
- 171. A rule that can be used to determine a conditional probability from unconditional probabilities is P(A|B) = P(A and B) / P(B) where, P(A|B) = the (conditional) probability that event A will occur given that event B has occurred already P(A and B) = the (unconditional) probability that event A and event B occur P(B) = the (unconditional) probability that event B occurs
- 172. Binomial (Bernoulli) Distribution Bernoulli trial: a random event that can take on only one of two possible outcomes, with the outcomes arbitrarily denoted as either – “success” or “yes” or “true” – “failure” or “no” or “false” For example, flipping a coin is an example of a Bernoulli trial, since the outcome is either a head (yes) or a tail (no).
- 173. X = Number of successes in n trials xnpxp xnx nxP )1( )!(! !)( n = 6, x = 0 n = 6, x = 1 n = 6, x = 3 x0 n Binomial Distribution
- 174. Binomial Distribution 0 10 20 30 40 50 0.00 0.05 0.10 i b(i;50,p) p=.25 p=.5 p=.75
- 175. Binomial Distribution 7 9 11 13 15 17 19 21 0 10 20 30 C1 0 10 20 30 40 50 0.00 0.05 0.10 i b(i;50,p) p=.25 p=.5 p=.75 13 16 20 11 12 10 12 20 16 15 10 12 9 18 12 11 11 9 11 14 13 8 4 13 12 14 11 14 15 12 18 13 7 11 9 15 11 8 11 16 9 12 12 18 15 13 9 15 12 12
- 176. APPLICATION EXAMPLES (from Medicine) • The number of patients out of n that respond to treatment. • The number of people in a community of n people that have asthma. • The number of people in a group of n intravenous drug users who are HIV positive.
- 177. Binomial Distribution Probability that a patient with swollen leg has a clot = 0.3 What is the distribution of patients with a clot among a total of 10 patients with swollen leg. No. of Patients Probability
- 178. POISSON DISTRIBUTION Used to describe a number of events (usually rare) in an interval given an average number of events. Gives the number of times a particular event occurs in a given unit interval. The mean number of events in each unit will be denoted by . Unit intervals may be in units of area, volume, etc. Most commonly, however, unit intervals are
- 179. Poisson Distribution P(x) xe– x! x = 0 x = 2 x = 8 x0 events X = Number of occurrences of an event
- 181. Applications of Poisson Distribution Modeling the distribution of phone calls The arrivals of trucks and cars at a tollbooth The number of accidents at an intersection Counting nuclear decay events The demand of patients for service at a health institution
- 182. Normal Distribution The normal distribution (also called the Gaussian distribution) is a family of distributions recognized as being symmetrical unimodal bell-shaped. The normal distribution is characterized by two parameters: 1. mean (µ) determines the distribution’s location. Fig. 1 shows two normal distributions with different means 2. The standard deviation (σ) of a particular normal distribution determines its spread. Fig. 2 demonstrates two normal distributions with different spreads:
- 183. The z-score Assume a population of normal distribution with statistic X. The z-score (i.e., standardized X-statistic) is defined by: And, if the distribution of X was normal, or at least approximately normal, you could then take that z-score, and refer it to a table of the standard normal distribution to figure out the proportion of scores higher than X, or lower than X, etc. X XXX z
- 184. Fig. 1. Two Normal distributions with different means
- 185. Fig. 2. Two Normal distributions with different standard deviations
- 186. Normal Distribution 0 10 20 0.0 0.1 0.2 0.3 0.4 x y Sigma = 1 Sigma = 2 Sigma = 3
- 187. Normal Distribution X P(<X) P(Xi< <Xi+1) -3.0 0.001350 0.004432 -2.9 0.001866 0.005953 -2.8 0.002555 0.007915 -2.7 0.003467 0.010421 -2.6 0.004661 0.013583 -2.5 0.006210 0.017528 -2.4 0.008198 0.022395 -2.3 0.010724 0.028327 -2.2 0.013903 0.035475 -2.1 0.017864 0.043984 -2.0 0.022750 0.053991 -1.9 0.028717 0.065616 -1.8 0.035930 0.078950 -1.7 0.044565 0.094049 -1.6 0.054799 0.110921 -1.5 0.066807 0.129518 -1.4 0.080757 0.149727 -1.3 0.096800 0.171369 -1.2 0.115070 0.194186 -1.1 0.135666 0.217852 -1.0 0.158655 0.241971 -0.9 0.184060 0.266085 -0.8 0.211855 0.289692 -0.7 0.241964 0.312254 -0.6 0.274253 0.333225 -0.5 0.308538 0.352065 -0.4 0.344578 0.368270 -0.3 0.382089 0.381388 -0.2 0.420740 0.391043 -0.1 0.460172 0.396953 0.0 0.500000 0.398942 X P(<X) P(Xi< <Xi+1) 0.0 0.500000 0.398942 0.1 0.539828 0.396953 0.2 0.579260 0.391043 0.3 0.617911 0.381388 0.4 0.655422 0.368270 0.5 0.691462 0.352065 0.6 0.725747 0.333225 0.7 0.758036 0.312254 0.8 0.788145 0.289692 0.9 0.815940 0.266085 1.0 0.841345 0.241971 1.1 0.864334 0.217852 1.2 0.884930 0.194186 1.3 0.903200 0.171369 1.4 0.919243 0.149727 1.5 0.933193 0.129518 1.6 0.945201 0.110921 1.7 0.955435 0.094049 1.8 0.964070 0.078950 1.9 0.971283 0.065616 2.0 0.977250 0.053991 2.1 0.982136 0.043984 2.2 0.986097 0.035475 2.3 0.989276 0.028327 2.4 0.991802 0.022395 2.5 0.993790 0.017528 2.6 0.995339 0.013583 2.7 0.996533 0.010421 2.8 0.997445 0.007915 2.9 0.998134 0.005953 3.0 0.998650 0.004432 N(0,1)
- 188. Normal Distribution 7.5 8.0 8.5 9.0 9.5 10.0 10.5 11.0 11.5 12.0 12.5 0 10 20 C1 0 10 20 0.0 0.1 0.2 0.3 0.4 x y Sigma = 1 Sigma = 2 Sigma = 3 7.9006 11.5151 9.9542 9.4493 8.2387 10.4707 9.4041 9.3517 10.5664 10.9079 10.0077 12.5188 9.6937 10.0757 10.1616 10.2881 9.8560 10.0014 9.8467 11.5006 10.2982 9.6023 9.7238 11.5413 8.4595 9.2372 11.0408 12.8996 9.5590 9.1041 8.9170 9.7734 7.9844 8.3484 11.3703 10.6260 10.0952 11.4019 8.9842 9.3783 9.7574 7.9312 8.1566 9.9305 9.1158 8.6436 10.4689 9.3356 10.8788 7.8790
- 189. Sampling Distributions Imagine drawing (with replacement) all possible samples of size n from a population, and for each sample, calculating a statistic -- e.g., the sample mean. The frequency distribution of those sample means would be the sampling distribution of the mean (for samples of size n drawn from that particular population). Normally, one thinks of sampling from relatively large populations, but the concept of a sampling distribution can be illustrated with a small population.
- 190. Example A population consisted of the following 5 scores: 2, 3, 4, 5, and 6. The population mean = 4, and the population standard deviation (dividing by N) = 1.414. If we draw (with replacement) all possible samples of 2 from this population, we would end up with the 25 samples shown in Table. This distribution (histogram) of sample means is called the sampling distribution of the mean for samples of n=2 from the population of interest (i.e., our population of 5 scores)
- 191. All possible samples of n=2 from a population of 5 scores Mean of the sample means = 4.000 SD of the sample means = 1.000 (SD calculated with division by N)
- 193. The Central Limit Theorem (CLT) The mean of the sampling distribution of the mean = the population mean The SD of the sampling distribution of the mean = the standard error (SE) of the mean = the population standard deviation divided by the square root of the sample size Putting these statements into symbols: XX { mean of the sample means = the population mean } n X X { SE of mean = population SD over square root of n }
- 194. What the CLT tells us about the shape of the sampling distribution The central limit theorem also provides us with some very helpful information about the shape of the sampling distribution of the mean. Specifically, it tells us the conditions under which the sampling distribution of the mean is normally distributed, or at least approximately normal, where approximately means close enough to treat as normal for practical purposes. The shape of the sampling distribution depends on two factors: – the shape of the population from which the sample has been drawn, and – The sample size.
- 195. The Shape of the Sampling Distribution If the population from which you sampled is itself normally distributed, then the sampling distribution of the mean will be normal, regardless of sample size. (Even for sample size = 1, the sampling distribution of the mean will be normal, because it will be an exact copy of the population distribution). If the population distribution is reasonably symmetrical (i.e., not too skewed, reasonably normal looking), then the sampling distribution of the mean will be approximately normal for samples of 30 or greater. If the population shape is as far from normal as possible, the sampling distribution of the mean will still be approximately normal for sample sizes of 300 or greater.
- 196. The z-scores for the Sampling Distribution of the Means Based on what we learned from the central limit theorem, we are now in a position to compute a z-score as follows: n XX z X X X / And, if the sampling distribution of X is normal, or at least approximately normal, we may then refer this value of z to the standard normal distribution, just as we did when we were using raw scores.
- 197. An Example. Here is a (fictitious) newspaper advertisement for a program designed to increase intelligence of school children:
- 198. Example (Contd.) An expert on IQ knows that in the general population of children, the mean IQ = 100, and the population SD = 15 (for the WISC, at least). He also knows that IQ is (approximately) normally distributed in the population. Equipped with this information, you can now address questions such as: If the n = 25 children from Dundas are a random sample from the general population of children, A. What is the probability of getting a sample mean of 108 or higher? B. What is the probability of getting a sample mean of 92 or lower? C. How high would the sample mean have to be for you to say that the probability of getting a mean that high (or higher) was 0.05 (or 5%)? D. How low would the sample mean have to be for you to say that the
- 199. Solution If we have sampled from the general population of children, as we are assuming, then the population from which we have sampled is at least approximately normal. Therefore, the sampling distribution of the mean will be normal, regardless of sample size. Therefore, we can compute a z-score, and refer it to the table of the standard normal distribution. So, for part (A): And from a table of the standard normal distribution we can see that the probability of a z-score greater than or equal to 2.667 = 0.0038. Translating that back to the original units, we could say that the probability of getting a sample mean of 108 (or greater) is .0038 (assuming that the 25 children are a random sample from the general population). 667.2 3 8 25 15 100108 n XX z X X X X
- 200. For part (B), do the same, but replace 108 with 92: 667.2 3 8 25 15 10092 n XX z X X X X And the probability of a sample mean less than or equal to 92 is also equal to 0.0038. Had we asked for the probability of a sample mean that is either 108 or greater, or 92 or less, the answer would be 0.0038 + 0.0038 = 0.0076. Part (C) above amounts to the same thing as asking, "What sample mean corresponds to a z-score of 1.645?", because we know that p (z≥ = 1.645) = 0.05. We can start out with the usual z-score formula and try to determine the corresponding value of X Because X X X z we should have 935.104100 25 15 645.1 XX zX So, had we obtained a sample mean of 105, we could have concluded that the probability of a mean that high or higher was .05 (or 5%).
- 201. For part (D), because of the symmetry of the standard normal distribution about 0, we would use the same method, but substituting -1.645 for 1.645. This would yield an answer of 100 - 4.935 = 95.065. So the probability of a sample mean less than or equal to 95 is also 5%.
- 202. Hypothesis Testing
- 203. What hypotheses would come out as a byproduct of the analysis of the above data? One has two hypotheses: – Null hypothesis – Alternative hypothesis These two hypotheses are mutually exclusive and exhaustive. In other words, they cannot share any outcomes in common, but together must account for all possible outcomes. Informally, the null hypothesis typically states something along the lines of, "there is no treatment effect", or "there is no difference between the groups". The alternative hypothesis typically states that “there is a treatment effect “, or that there is a difference between the groups. Furthermore, an alternative hypothesis may be directional or non- directional. That is, it may or may not specify the direction of the difference between the groups.
- 204. A directional alternative hypothesis Ho: μ ≤ 100 H1: μ > 100 This pair of hypotheses can be summarized as follows. If the alternative hypothesis is true, the sample of 25 children we have drawn is from a population with mean IQ greater than 100. But if the null hypothesis is true, the sample is from a population with mean IQ equal to or less than 100. Thus, we would only be in a position to reject the null hypothesis if the sample mean is greater than 100 by a sufficient amount. If the sample mean is less than 100, no matter by how much, we would not be able to reject Ho .
- 205. How much greater than 100 must the sample mean be for us to be comfortable in rejecting the null hypothesis? The answer that most disciplines use by convention the following: The difference between and µ must be large enough that the probability this large difference occurred by chance (given a true null hypothesis) is 5% or less. The observed sample mean for this example was 108. As we saw earlier, this corresponds to a z-score of 2.667, and p (z≥ = 2.667) 0.0038. (The value of 0.0038 is as a matter of fact that the probability that the big difference between and µ came out as a matter of chance. Therefore, we could reject Ho , and we would act as if the sample was drawn from a population in which mean IQ is greater than 100. X X
- 206. A non-directional alternative hypothesis Ho: μ = 100 H1: μ ≠ 100 In this case, the null hypothesis states that the 25 children are a random sample from a population with mean IQ = 100, and the alternative hypothesis says they are not ― but it does not specify the direction of the difference from 100. In the directional test, we needed to have > 100 by a sufficient amount, in order to reject Ho. But in this case, with a non-directional alternative hypothesis, we may reject Ho if < 100 or if > 100 , provided the difference is large enough. For this example, the sample mean = 108. This represents a difference of +8 from the population mean (under a true null hypothesis). Because we are interested in both tails of the distribution, we must figure out the probability of a difference of +8 or greater, or a change of -8 or greater. In other words, p ( ≥ 108) + p ( < 92) = .0038 + .0038 = .0076. X X X X X
- 207. Single sample t-test (when σ is not known) In many real-world cases of hypothesis testing, one does not know the standard deviation of the population. In such cases, it must be estimated using the sample standard deviation. That is, s (calculated with division by n-1) is used to estimate σ. Other than that, the calculations are as we saw for the z-test for a single sample ― but the test statistic is called t, not z. X X n s X t 1 degrees of freedom (df = n – 1). Here we have n s sX and 11 1 2 n SS n XX s X n i i N.B. There are n-1 degrees of freedom whenever you calculate a sample variance (or standard deviation).
- 208. To calculate the p-value for a single sample z-test, we used the standard normal distribution. For a single sample t-test, we must use a t-distribution with n-1 degrees of freedom. As this implies, there is a whole family of t-distributions, with degrees of freedom ranging from 1 to infinity ∞. All t-distributions are symmetrical about 0, like the standard normal. In fact, the t-distribution with df = ∞ is identical to the standard normal distribution. t-distributions with df < ∞ have lower peaks and thicker tails than the standard normal distribution.
- 209. Probability density functions of: the standard normal distribution (the highest peak with the thinnest tails); the t-distribution with df =10 (intermediate peak and tails); and the t-distribution with df=2 (the lowest peak and thickest tails). The dotted lines are at -1.96 and +1.96, the critical values of z for a two-tailed test with alpha = .05. For all t-distributions with df < ∞ , the proportion of area beyond - 1.96 and +1.96 is greater than .05. The lower the degrees of freedom, the thicker the tails, and the greater the
- 210. Area beyond critical values of t = ±1.96 in various t-distributions. The t-distribution with df = ∞ is identical to the standard normal distribution.
- 211. Example of single-sample t-test A researcher believes that in recent years women have been getting taller. She knows that 10 years ago the average height of young adult women living in her city was 63 inches. The standard deviation is unknown. She randomly samples eight young adult women currently residing in her city and measures their heights. The following data are obtained: [64, 66, 68, 60, 62, 65, 66, 63.] The null hypothesis is that these 8 women are a random sample from a population in which the mean height is 63 inches. The non-directional alternative states that the women are a random sample from a population in which the mean is not 63 inches.
- 212. Solution The sample mean is 64.25. Because the population standard deviation is not known, we must estimate it using the sample standard deviation. 5495.2 7 25.646325.646625.6464 1 222 1 2 n XX s n i i We can now use the sample standard deviation to estimate the standard error of the mean: 901.0 8 5495.2 meanofSEEstimated n s sX And finally: 387.1 901.0 6325.64 X X s X t
- 213. This value of t can be referred to a t-distribution with df = n-1 = 7. Doing so, it is found that the conditional probability of obtaining a t-statistic with absolute value equal to or greater than 1.387 is equal to 0.208. Therefore, assuming that alpha had been set at the usual 0.05 level, the researcher cannot reject the null hypothesis.
- 214. Paired (or related samples) t-test Suppose you have either 2 scores for each person (e.g., before and after), or when you have matched pairs of scores (e.g., husband and wife pairs, or twin pairs). The paired t-test may be used in this case, given that its assumptions are met adequately. Quite simply, the paired t-test is just a single-sample t-test performed on the difference scores. That is, for each matched pair, compute a difference score. Whether you subtract Score (1) from Score (2) or vice versa does not matter, so long as you do it the same way for each pair. Then perform a single-sample t-test on those differences. The null hypothesis for this test is that the difference scores are a random sample from a population in which the mean difference has some value which you specify.
- 215. For example, suppose you found some old research which reported that on average, husbands were 5 inches taller than their wives. If you wished to test the null hypothesis that the difference is still 5 inches today (despite the overall increase in height), your null hypothesis would state that your sample of difference scores (from husband/wife pairs) is a random sample from a population in which the mean difference = 5 inches. In the equations for the paired t-test, is often replaced with , which stands for the mean difference. X D
- 216. where D = the (sample) mean of the difference scores D = the mean difference in the population, given a true Ho (often D = 0, but not always) Ds = sample standard deviation of the difference score (dividing by n-1) n = number of matched pairs (number of individuals = 2n) D s = SE of the mean difference df = n - 1 n s D s D t D D D D
- 217. Example of paired t-test A political candidate wishes to determine if endorsing increased social spending is likely to affect her standing in the polls. She has access to data on the popularity of several other candidates who have endorsed increases spending. The data was available both before and after the candidates announced their positions on the issue [see Table].
- 218. Data for paired t-test example Popularity Ratings Candidate DifferenceAfterBefore 143421 445412 656503 254524 765585 -329326 746397 648428 -147489 6534710
- 219. Solution Examining the last column we find out that: D = 3.5 If the null hypothesis is true, we shall assume that D = 0 D s = 3.5668and Since n = 10, it turns out that SE = 1.1279; and thus t = 3.103 Now df = 9 The null hypothesis for this test states that the mean difference in the population is zero; that in other words, endorsing increased social spending has no effect on popularity ratings in the population from which we have sampled. If that is true, the probability of seeing a difference of 3.5 points or more is 0.0127 (the p-value). Therefore, the politician would likely reject the null hypothesis, and would endorse increased social spending, once again since there is only a probability of 0.0127 justifying that his rejection came out by chance. As a matter of fact, a two-sided p-value for significance is given by p = 0.0127 (using SPSS or MATLAB).
- 220. Unpaired (or independent samples) t-test Another common form of the t-test may be used if you have 2 independent samples (or groups). The formula for this version of the test is given by: 21 2121 XX s XX t is the difference between the means of two (independent) samples, or the difference between group means. is the difference between the corresponding population means, assuming that Ho is true. 21 XX 21 21 2 11 21 nn ss pooledXX groupswithin groups 21 212 df2 var within pooled SS nn SSSS estimateiancepooleds
- 221. 2 1 1 1 n i i XXSS 2 1 2 2 n i i XXSS n1 = sample size for Group 1 n2 = sample size for Group 2 df = n1 + n2 - 2
- 222. Example of unpaired t-test A nurse was hired by a governmental ecology agency to investigate the impact of a lead smelter on the level of lead in the blood of children living near the smelter. Ten children were chosen at random from those living near the smelter. A comparison group of 7 children was randomly selected from those living in an area relatively free from possible lead pollution. Blood samples were taken from the children, and lead levels determined. Given the tabulated results (scores are in micrograms of lead per 100 milliliters of blood) and using α2−tailed = 0.01 , what do you conclude?
- 223. Lead Levels Children Living in Unpolluted AreaChildren Living near Smelter 918 1316 821 1514 1717 1219 1122 24 15 18
- 224. Solution The null hypothesis for this example is that the 2 groups of children are 2 random samples from populations with the same mean levels of lead concentration in the blood. Thus, = 0. Now n1 = 10, n2 = 7, and = 18.4 and = 12.1429. Thus, = 6.2571 21 2X1X 21 XX SS1 = 90.4 and SS2 = 60.8571 Further df = 10 +7 – 2 = 15 So, SS1 + SS2 ≈ 90.4 + 60.9 = 151.3 0867.10 15 3.1512 pooleds and 565.1 7 1 10 1 0867.10 21 XX s 998.3 565.1 2571.6 t and a two-tailed significance is given by p = 0.0012 < α2-tailed The null hypothesis of two equal means is likely to be rejected in that case.
- 225. Sample Size, Precision, and Power A study that is insufficiently precise or lacks the power to reject a false null hypothesis is a waste of time and money. A study that collects too much data is also wasteful. Therefore, before collecting data, it is essential to determine the sample size requirements of a study.
- 226. Sample Size Calculation Before calculating the sample size requirements of a study you must address some questions: The theme of learning from the study mean mean difference proportion proportion (risk) ratio odds ratio Slope • The estimation methodology • with a given precision with a given power • The type of sample • A single group • Two or more independent groups • Matched pairs
- 227. 9/8/2017 (c) 2001, Ron S. Kenett, Ph.D. 227 Parametric Statistical Inference Instructor: Ron S. Kenett Email: ron@kpa.co.il Course Website: www.kpa.co.il/biostat Course textbook: MODERN INDUSTRIAL STATISTICS, Kenett and Zacks, Duxbury Press, 1998
- 228. Null and Alternative Hypotheses In any experiment, there are two hypotheses that attempt to explain the results. They are the null hypothesis and alternative hypothesis. Alternative Hypothesis (H1 or HA). In experiments that entail manipulation of an independent variable, the alternative hypothesis states that the results of the experiment are due to the effect of the independent variable. In a coin tossing experiment, H1 would state that the biased coin had been selected, and that p(Head) = 0.15. Null Hypothesis (H0). The null hypothesis is the complement of the alternative hypothesis. In other words, if H1 is not true, then H0 must be true, and vice versa. In the foregoing coin tossing situation, H0 asserts that the fair coin was selected, and that p(Head) = 0.50.
- 229. Null and Alternative Hypotheses Thus, the decision rule to minimize the overall p(error) can be restated as follows: if p(X | H0 ) > p(X | H1 ) then do not reject H0 if p(X | H0 ) < p(X | H1 ) then reject H0 where X = independent random variable (usually statistic) According to statistical purists, it is only proper to reject the null hypothesis or fail to reject the null hypothesis. Acceptance of either hypothesis is strictly forbidden.
- 230. Rejection Region The rejection region is a range containing outcomes that lead to rejection of H0 .
- 231. 9/8/2017 (c) 2001, Ron S. Kenett, Ph.D. 231 Step 1. A claim is made. A new claim is asserted that challenges existing thoughts about a population characteristic. Suggestion: Form the alternative hypothesis first, since it embodies the challenge. The Logic of Hypothesis Testing
- 232. 9/8/2017 (c) 2001, Ron S. Kenett, Ph.D. 232 The Logic of Hypothesis Testing Step 2. How much error are you willing to accept? Select the maximum acceptable error, a. The decision maker must elect how much error he/she is willing to accept in making an inference about the population. The significance level of the test is the maximum probability that the null hypothesis will be rejected incorrectly, a Type I error.
- 233. 9/8/2017 (c) 2001, Ron S. Kenett, Ph.D. 233 The Logic of Hypothesis Testing Step 3. If the null hypothesis were true, what would you expect to see? Assume the null hypothesis is true. This is a very powerful statement. The test is always referenced to the null hypothesis. Form the rejection region, the areas in which the decision maker is willing to reject the presumption of the null hypothesis.
- 234. 9/8/2017 (c) 2001, Ron S. Kenett, Ph.D. 234 The Logic of Hypothesis Testing Step 4. What did you actually see? Compute the sample statistic. The sample provides a set of data that serves as a window to the population. The decision maker computes the sample statistic and calculates how far the sample statistic differs from the presumed distribution that is established by the null hypothesis.
- 235. 9/8/2017 (c) 2001, Ron S. Kenett, Ph.D. 235 The Logic of Hypothesis Testing Step 5. Make the decision. The decision is a conclusion supported by evidence. The decision maker will: reject the null hypothesis if the sample evidence is so strong, the sample statistic so unlikely, that the decision maker is convinced H1 must be true. fail to reject the null hypothesis if the sample statistic falls in the nonrejection region. In this case, the decision maker is not concluding the null hypothesis is true, only that there is insufficient evidence to dispute it based on this sample.
- 236. 9/8/2017 (c) 2001, Ron S. Kenett, Ph.D. 236 The Logic of Hypothesis Testing Step 6. What are the implications of the decision for future actions? State what the decision means in terms of the research program. The decision maker must draw out the implications of the decision. Is there some action triggered, some change implied? What recommendations might be extended for future attempts to test similar hypotheses?
- 237. 9/8/2017 (c) 2001, Ron S. Kenett, Ph.D. 237 Type I Error: Saying you reject H0 when it really is true. Rejecting a true H0. Type II Error: Saying you do not reject H0 when it really is false. Failing to reject a false H0. Two Types of Errors
- 238. 9/8/2017 (c) 2001, Ron S. Kenett, Ph.D. 238 What are acceptable error levels? Decision makers frequently use a 5% significance level. Use a = 0.05. An a-error means that we will decide to adjust the machine when it does not need adjustment. This means, in the case of the robot welder, if the machine is running properly, there is only a 0.05 probability of our making the mistake of concluding that the robot requires adjustment when it really does not.
- 239. 9/8/2017 (c) 2001, Ron S. Kenett, Ph.D. 239 Three Types of Tests Nondirectional, two-tail test: H1: pop parameter n.e. value Directional, right-tail test: H1: pop parameter > value Directional, left-tail test: H1: pop parameter < value Always put hypotheses in terms of population parameters and have H0: pop parameter = value
- 240. 9/8/2017 (c) 2001, Ron S. Kenett, Ph.D. 240 Two tailed test a/2 a/21a –z +z Do Not Reject H 0 00 Reject HReject H H0: pop parameter = value H1: pop parameter n.e. value
- 241. 9/8/2017 (c) 2001, Ron S. Kenett, Ph.D. 241 Right tailed test H0: pop parameter value H1: pop parameter > value a1a +z Do Not Reject H 00 Reject H
- 242. 9/8/2017 (c) 2001, Ron S. Kenett, Ph.D. 242 Left tailed test H0: pop parameter value H1: pop parameter < value a 1a –z Do Not Reject H 0Reject H 0
- 243. 9/8/2017 (c) 2001, Ron S. Kenett, Ph.D. 243 H1 Ho Ho H1 OK OK Type I Error Type II Error
- 244. Example Suppose you have two coins in your pocket. – One is a fair coin—i.e., p(Head) = p(Tail) = 0.5. – The other coin is biased toward Tails: p(Head) = .15, p(Tail) = .85.
- 245. Then place the two coins on a table, and choose one of them. You take the selected coin, and flip it 11 times, noting each time whether it showed Heads or Tails. Let X = the number of Heads observed in 11 flips. Let Hypothesis A (Null Hypothesis) be that you selected and flipped the fair coin. Let Hypothesis B (Alternate Hypothesis) be that you selected and flipped the biased coin. Under what circumstances would you decide that Hypothesis A is true? Under what circumstances would you decide that Hypothesis B is true?
- 246. A good way to start is to think about what kinds of outcomes you would expect for each hypothesis. For example, if hypothesis A is true (i.e., the coin is fair), you expect the number of Heads to be somewhere in the middle of the 0-11 range. But if hypothesis B is true (i.e., the coin is biased towards tails), you probably expect the number of Heads to be quite small. Note as well that a very large number of Heads is improbable in either case, but is even less probable if the coin is biased towards tails.
- 247. Then what is the decision? IF the number of heads is LOW THEN decide that the coin is biased towards TAILS (Hypothesis B) ELSE decide that the coin is fair (Hypothesis A) But an obvious problem now is, how low is low? The answer is really quite simple. The key is to recognize that the variable X (the number of Heads) has a binomial distribution. Furthermore, if Hypothesis A is true, X will have a binomial distribution with N = 11, p = .5, and q = .5. But if hypothesis B is true, then X will have a binomial
- 248. Two Binomial Distributions with N = 11 and X = # of Heads
- 249. We are now in a position to compare conditional probabilities for particular experimental outcomes. For example, if we actually did carry out the coin tossing experiment and obtained 3 Heads (X=3), we would know that the probability of getting exactly 3 Heads is lower if Hypothesis A is true (.0806) than it is if Hypothesis B is true (.1517). Therefore, we might decide that Hypothesis B is true if the outcome was X = 3 Heads. But what if we had obtained 4 Heads (X=4) rather than 3? In this case the probability of exactly 4 Heads is higher if Hypothesis A is true (.1611) than it is if hypothesis B is true (.0536). So in this case, we would probably decide that Hypothesis A is true (i.e., the coin is fair).
- 250. And there is always a chance of an error! Note that even if the coin is biased towards tails, it is possible for the number of Heads to be very large; and if the coin is fair, it is possible to observe very few Heads. No matter which hypothesis we choose, therefore, there is always the possibility of making an error. However, the use of the decision rule described here will minimize the overall probability of error. In the present example, this rule would lead us to decide that the coin is biased if the number of Heads was 3 or less; but for any other outcome, we would conclude that the coin is fair.
- 251. Decision rule to minimize the overall probability of error Rejection region
- 252. 9/8/2017 (c) 2001, Ron S. Kenett, Ph.D. 252 What Test to Apply? Ask the following questions: Are the data the result of a measurement (a continuous variable) or a count (a discrete variable)? Is known? What shape is the distribution of the population parameter? What is the sample size?
- 254. Sample Size Requirements for Estimating a Mean or Mean Difference where n = sample size s = standard deviation d = margin of error (at 95% confidence level) For paired samples 2 2 4 d s n 2 2 4 d s n d where sd= standard deviation of the DELTA variable
- 255. For independent samples 2 2 4 d s n p where sp= pooled estimate of standard deviation
- 256. 9/8/2017 (c) 2001, Ron S. Kenett, Ph.D. 256 Test of µ, Known, Population Normally Distributed Test Statistic: where is the sample statistic. µ0 is the value identified in the null hypothesis. is known. n is the sample size. n x z 0 – x
- 257. 9/8/2017 (c) 2001, Ron S. Kenett, Ph.D. 257 Test of µ, Known, Population Not Normally Distributed If n > 30, Test Statistic: If n < 30, use a distribution-free test. n x z 0 –
- 258. 9/8/2017 (c) 2001, Ron S. Kenett, Ph.D. 258 Test of µ, Unknown, Population Normally Distributed Test Statistic: where is the sample statistic. µ0 is the value identified in the null hypothesis. is unknown. n is the sample size degrees of freedom on t are n – 1. x x– n st 0
- 259. 9/8/2017 (c) 2001, Ron S. Kenett, Ph.D. 259 Test of µ, Unknown, Population Not Normally Distributed If n > 30, Test Statistic: If n < 30, use a distribution-free test. t x – 0 s n
- 260. 9/8/2017 (c) 2001, Ron S. Kenett, Ph.D. 260 Observed Significance Levels A p-Value is: the exact level of significance of the test statistic. the smallest value a can be and still allow us to reject the null hypothesis. the amount of area left in the tail beyond the test statistic for a one-tailed hypothesis test or twice the amount of area left in the tail beyond the test statistic for a two-tailed test. the probability of getting a test statistic from another sample that is at least as far from the hypothesized mean as this sample statistic is.
- 261. 9/8/2017 (c) 2001, Ron S. Kenett, Ph.D. 261 Several Samples Independent Samples: Testing a company’s claim that its peanut butter contains less fat than that produced by a competitor. Dependent Samples: Testing the relative fuel efficiency of 10 trucks that run the same route twice, once with the current air filter installed and once with the new filter.
- 262. 9/8/2017 (c) 2001, Ron S. Kenett, Ph.D. 262 Test of (µ1 – µ2), 1 = 2, Populations Normal Test Statistic where degrees of freedom on t = n1 + n2 – 2 2– 21 2 2 )1– 2 (2 1 )1– 1 ( 2where 2 1 1 12 ] 2 – 1 [–] 2 – 1 [ nn snsn ps nnps xx t !! ! ! ! ! ! !! ! ! ! ! !
- 263. 9/8/2017 (c) 2001, Ron S. Kenett, Ph.D. 263 The mean of population 1 is equal to the mean of population 2 (1) Both distributions are normal 2 1 = 2 Hypothesis Assumption Test Statistic t -distribution with df = n1+ n2-2 2/11/1/1 21 2 22 2 1121 21 nnsnsnnn XX t H0: pop1 = pop2 H1: pop1 ≠ pop2 Example: Comparing Two populations
- 264. 9/8/2017 (c) 2001, Ron S. Kenett, Ph.D. 264 -5 0 5 0.0 0.1 0.2 0.3 0.4 0.5 t t(x;nu) nu=5 nu=50 t -distribution with df = n1+ n2-2 2/11/1/1 21 2 22 2 1121 21 nnsnsnnn XX t Rejection Region Rejection Region Example: Comparing Two populations
- 265. 9/8/2017 (c) 2001, Ron S. Kenett, Ph.D. 265 Test of (µ1 – µ2), 1 n.e. 2, Populations Normal, large n Test Statistic with s1 2 and s2 2 as estimates for 1 2 and 2 2 z [x 1 –x 2 ]–[ 1 – 2 ] 0 s 1 2 n 1 s 2 2 n 2
- 266. 9/8/2017 (c) 2001, Ron S. Kenett, Ph.D. 266 Test of Dependent Samples (µ1 – µ2) = µd Test Statistic where d = (x1 – x2) = Sd/n, the average difference n = the number of pairs of observations sd = the standard deviation of d df = n – 1 n d s dt d
- 267. 9/8/2017 (c) 2001, Ron S. Kenett, Ph.D. 267 Test of Equal Variances Pooled-variances t-test assumes the two population variances are equal. The F-test can be used to test that assumption. The F-distribution is the sampling distribution of s1 2/s2 2 that would result if two samples were repeatedly drawn from a single normally distributed population.
- 268. 9/8/2017 (c) 2001, Ron S. Kenett, Ph.D. 268 Test of 1 2 = 2 2 If 1 2 = 2 2 , then 1 2/2 2 = 1. So the hypotheses can be worded either way. Test Statistic: (whichever is larger) The critical value of the F will be F(a/2, n1, n2) where a = the specified level of significance n1 = (n – 1), where n is the size of the sample with the larger variance n2 = (n – 1), where n is the size of the sample with the smaller variance 2 1 2 2or 2 2 2 1 s s s s F
- 269. 9/8/2017 (c) 2001, Ron S. Kenett, Ph.D. 269 Confidence Interval for (µ1 – µ2) The (1 – a)% confidence interval for the difference in two means: Equal variances, populations normal Unequal variances, large samples ! ! ! ! ! ! ! ! ! ! ! ! ! ! ׳± 2 1 1 12 2 ) 2 – 1 ( nn p stxx a 2 2 2 1 2 1 2 ) 2 – 1 ( n s n s zxx ׳± a
- 270. 9/8/2017 (c) 2001, Ron S. Kenett, Ph.D. 270 The mean of population 1 is equal to the mean of population 2 (1) Both distributions are normal 2 1 = 2 Hypothesis Assumption Test Statistic The standard deviation of population 1 is equal to the standard deviation of population 2 Both distributions are normal The proportion of error in population 1 is equal to the proportion of errors in population 2 n1p1 and n2p2 > 5 (approximation by normal distribution) F distribution with df2 = n1-1 and df2 = n2-12 2 2 1 s s F t distribution with df = n1+ n2-2 2/11/1/1 21 2 22 2 1121 21 nnsnsnnn XX t Z - Normal distribution 21 2211 /1/11 // nnpp nXnX Z avgavg 21 21 nn XX pavg Summary
- 273. A Case Study Consider the development of a placement machine that picks components from a tray and positions them on printed circuit boards. The customer requirements involve precision in the x-y position. The developers of the system collected data from 26 boards, with 16 components on each. For each board the deviations in x and y, from the required nominal values, were recorded, producing 416 values for x_dev and y_dev.
- 274. A Case Study -0.003 -0.002 -0.001 0.000 0.001 0.002 0.003 0.004 0.005 -0.003 -0.002 -0.001 0.000 0.001 0.002 0.003 x_dev y_dev Figure 1: Scatter plot of y deviations versus x deviations
- 275. A Case Study 1 2 3 -0.003 -0.002 -0.001 0.000 0.001 0.002 0.003 0.004 0.005 -0.003 -0.002 -0.001 0.000 0.001 0.002 0.003 x_dev y_dev Figure 2: Scatter plot of y deviations versus x deviations with coding variable
- 276. YX YXYX r X Y 2 2 3 1 1 2 4 3 3 5 5 4 3.00 2.83 1.41 1.47 XY 4 3 2 12 15 20 9.33 Y XX Y X6 Y6 Mean StDev r = (9.33 - 3.00*2.83) / (1.41*1.47) = 0.41 The Correlation Coefficient
- 277. Coefficient of Correlation A measure of the Direction of the linear relationship between x and y. If x and y are directly related, r > 0. If x and y are inversely related, r < 0. Strength of the linear relationship between x and y. The larger the absolute value of r, the more the value of y depends in a linear way on the value of x.
- 278. 2.0 4.0 6.0 8.0 10.0 2.0 4.0 6.0 8.0 10.0 12.0 14.0 16.0 Y X Predicted Y X Y 7.6 5.6 4.2 3.9 10.2 6.7 13.8 7.9 13.6 7.7 14.5 8.4 3.2 3.2 7.9 5.9 13.4 7.1 4.7 4.9 5.9 4.4 3.5 3.7 3.4 3.2 5.0 3.1 5.6 4.5 Y = 2.07 + 0.424*X + e R2 = 94% Confidence Limits The Linear Regression Model
- 279. Simple Linear Regression Model Probabilistic Model: yi = b0 + b1xi + ei where yi = a value of the dependent variable, y xi = a value of the independent variable, x b0 = the y-intercept of the regression line b1 = the slope of the regression line ei = random error, the residual Deterministic Model: = b0 + b1xi where and is the predicted value of y in contrast to the actual value of y. ˆy i b 0 b 0 , b 1 b 1 ˆy i
- 280. Determining the Least Squares Regression Line Least Squares Regression Line: Slope y-intercept ˆy b0 b1 x1 b 1 ( x i y i ) – n x y ( x i 2) – n x2 b 0 y – b 1 x
- 281. Coefficient of Determination (R2) A measure of the Strength of the linear relationship between x and y. The larger the value of R2, the more the value of y depends in a linear way on the value of x. Amount of variation in y that is related to variation in x. Ratio of variation in y that is explained by the regression model divided by the total variation in y.