1.
Spatial Statistics (SGG 2413) Descriptive Statistics Assoc. Prof. Dr. Abdul Hamid b. Hj. Mar Iman Director Centre for Real Estate Studies Faculty of Engineering and Geoinformation Science Universiti Tekbnologi Malaysia Skudai, Johor Spatial Statistics: Topic 3 1
2.
Learning Objectives Overall: To give students a basic understanding of descriptive statistics Specific: Students will be able to: * understand the basic concept of descriptive statistics * understand the concept of distribution * can calculate measures of central tendency dispersion * can calculate measures of kurtosis and skewness Spatial Statistics: Topic 3 2
4.
Descriptive StatisticsUse sample information to explain/make abstraction of population “phenomena”.Common “phenomena”: * Association (e.g. σ1,2.3 = 0.75) * Tendency (left-skew, right-skew) * Trend, pattern, location, dispersion, range * Causal relationship (e.g. if X then Y)Emphasis on meaningful characterisation of data (e.g. central tendency, variability), graphics, and descriptionUse non-parametric analysis (e.g. χ2, t-test, 2-way anova) Spatial Statistics: Topic 3 4
5.
E.g. of Abstraction of phenomena 350,000 300,000 200000 No. of houses 250,000 150000 200,000 1991 100000 150,000 2000 100,000 50000 50,000 0 1 2 3 4 5 6 7 8 0 Loan t o propert y sect or ( RM 32635.8 38100.6 42468.1 47684.7 48408.2 61433.6 77255.7 97810.1 Kl u M ggi ta ng Se tian at ng Po ar rB t ho h a r million) ah m u Ko ua si n M Jo Pa n ga Ti er Demand f or shop shouses (unit s) 71719 73892 85843 95916 101107 117857 134864 86323 tu Supply of shop houses (unit s) 85534 85821 90366 101508 111952 125334 143530 154179 Ba Year (1990 - 1997) Trends in property loan, shop house demand & supply District 200 14 180 12Proportion (%) 10 160 8 Price (RM/sq.ft. built area) 6 140 4 120 2 0 100 4 4 4 4 4 4 4 4 -2 -3 -4 -5 -6 -1 -7 0- 10 20 30 40 50 60 70 80 Age Category (Years Old) 20 40 60 80 100 120 Demand (% sales success) Spatial Statistics: Topic 3 5
6.
Inferential StatisticsUsing sample statistics to infer some “phenomena” of population parametersCommon “phenomena”: cause-and-effect* One-way r/ship Y = f(X) * Feedback r/ship Y1 = f(Y2, X, e1) Y2 = f(Y1, Z, e2) * Recursive Y = f(X, e ) 1 1 Y2 = f(Y1, Z, e2)Use parametric analysis (e.g. α and β) through regression analysisEmphasis on hypothesis testing Spatial Statistics: Topic 3 6
7.
Parametric statisticsStatistical analysis that attempts to explain the population parameter using a sampleE.g. of statistical parameters: mean, variance, std. dev., R2, t-value, F-ratio, ρxy, etc.It assumes that the distributions of the variables being assessed belong to known parameterised families of probability distributions Spatial Statistics: Topic 3 7
8.
Examples of parametric relationship Dep=9t – 215.8 Dep=7t – 192.6 Coefficientsa Unstandardized Standardized Coefficients Coefficients Model B Std. Error Beta t Sig. 1 (Constant) 1993.108 239.632 8.317 .000 Tanah -4.472 1.199 -.190 -3.728 .000 Bangunan 6.938 .619 .705 11.209 .000 Ansilari 4.393 1.807 .139 2.431 .017 Umur -27.893 6.108 -.241 -4.567 .000 Flo_go 34.895 89.440 Spatial Statistics: Topic 3 .020 .390 .697 8 a. Dependent Variable: Nilaism
9.
Non-parametric statisticsFirst used by Wolfowitz (1942)Statistical analysis that attempts to explain the population parameter using a sample without making assumption about the frequency distribution of the assessed variableIn other words, the variable being assessed is distribution-freeE.g. of non-parametric statistics: histogram, stochastic kernel, non-parametric regression Spatial Statistics: Topic 3 9
10.
Descriptive & Inferential Statistics (DS & IS)DS gather information about a population characteristic (e.g. income) and describe it with a parameter of interest (e.g. mean)IS uses the parameter to test a hypothesis pertaining to that characteristic. E.g. Ho: mean income = RM 4,000 H1: mean income < RM 4,000)The result for hypothesis testing is used to make inference about the characteristic of interest (e.g. Malaysian → upper middle income) Spatial Statistics: Topic 3 10
11.
Sample Statistics: Central TendencyMeasure Advantages DisadvantagesMean ∗ Best known average ∗ Affected by extreme values(Sum of ∗ Can be absurd for discrete data ∗ Exactly calculableall values÷ ∗ Make use of all data (e.g. Family size = 4.5 person)no. of ∗ Useful for statistical analysis ∗ Cannot be obtained graphicallyvalues)Median ∗ Not influenced by extreme ∗ Needs interpolation for group/(middle values aggregate data (cumulativevalue) ∗ Obtainable even if data frequency curve) distribution unknown (e.g. ∗ May not be characteristic of group group/aggregate data) when: (1) items are only few; (2) ∗ Unaffected by irregular class distribution irregular width ∗ Very limited statistical use ∗ Unaffected by open-ended classMode ∗ Unaffected by extreme values ∗ Cannot be determined exactly in(most group data ∗ Easy to obtain from histogramfrequentvalue) ∗ Determinable from only values ∗ Very limited statistical use Spatial Statistics: Topic 3 11 near the modal class
12.
Central Tendency – Mean For individual observations, . E.g. X = {3,5,7,7,8,8,8,9,9,10,10,12} = 96 ; n = 12 Thus, = 96/12 = 8 The above observations can be organised into a frequency table and mean calculated on the basis of frequencies x 3 5 7 8 9 10 12 f 1 1 2 3 2 2 1 = 96; = 12 fx 3 5 14 24 18 20 12Thus, = 96/12 = 8 Spatial Statistics: Topic 3 12
13.
Central Tendency - Mean and Mid-pointLet say we have data like this: Price (RM ‘000/unit) of Shop Houses in Skudai Location Min Max Town A 228 450 Town B 320 430 Can you calculate the mean? Spatial Statistics: Topic 3 13
14.
Central Tendency - Mean and Mid-point(contd.)Let’s calculate: M = ½(Min + Max) Town A: (228+450)/2 = 339 Town B: (320+430)/2 = 375Are these figures means? Spatial Statistics: Topic 3 14
15.
Central Tendency - Mean and Mid-point(contd.)Let’s say we have price data as follows: Town A: 228, 295, 310, 420, 450 Town B: 320, 295, 310, 400, 430Calculate the means? Town A: Town B:Are the results same as previously?⇒ Be careful about mean and “mid-point”! Spatial Statistics: Topic 3 15
16.
Central Tendency – Mean of Grouped DataHouse rental or prices in the PMR are frequently tabulated as a range of values. E.g. Rental (RM/month) 135-140 140-145 145-150 150-155 155-160 Mid-point value (x) 137.5 142.5 147.5 152.5 157.5 Number of Taman (f) 5 9 6 2 1 fx 687.5 1282.5 885.0 305.0 157.5What is the mean rental across the areas? = 23; = 3317.5 Thus, = 3317.5/23 = 144.24 Spatial Statistics: Topic 3 16
17.
Central Tendency – Median Let say house rentals in a particular town are tabulated: Rental (RM/month) 130-135 135-140 140-145 155-50 150-155 Number of Taman (f) 3 5 9 6 2 Rental (RM/month) >135 > 140 > 145 > 150 > 155 Cumulative frequency 3 8 17 23 25 Calculation of “median” rental needs a graphical aids→ 1. Median = (n+1)/2 = (25+1)/2 =13th. 5. Taman 13th. is 5th. out of the 9 Taman Taman 2. (i.e. between 10 – 15 points on the 6. The rental interval width is 5 vertical axis of ogive). 7. Therefore, the median rental can 3. Corresponds to RM 140- 145/month on the horizontal axis be calculated as: 4. There are (17-8) = 9 Taman in the 140 + (5/9 x 5) = RM 142.8 range of RM 140-145/month Spatial Statistics: Topic 3 17
18.
Central Tendency – Median (contd.) Spatial Statistics: Topic 3 18
19.
Central Tendency – Quartiles (contd.) Following the same process as in calculating “median”: Upper quartile = ¾(n+1) = 19.5th. Taman UQ = 145 + (3/7 x 5) = RM 147.1/month Lower quartile = (n+1)/4 = 26/4 = 6.5 th. Taman LQ = 135 + (3.5/5 x 5) = RM138.5/month Inter-quartile = UQ – LQ = 147.1 – 138.5 = 8.6th. Taman IQ = 138.5 + (4/5 x 5) = RM 142.5/month Spatial Statistics: Topic 3 19
20.
VariabilityIndicates dispersion, spread, variation, deviationFor single population or sample data: where σ2 and s2 = population and sample variance respectively, xi = individual observations, μ = population mean, = sample mean, and n = total number of individual observations.The square roots are: standard deviation standard deviation Spatial Statistics: Topic 3 20
21.
Variability (contd.)Why “measure of dispersion” important?Consider yields of two plant species: * Plant A (ton) = {1.8, 1.9, 2.0, 2.1, 3.6} * Plant B (ton) = {1.0, 1.5, 2.0, 3.0, 3.9} Mean A = mean B = 2.28% But, different variability! Var(A) = 0.557, Var(B) = 1.367 * Would you choose to grow plant A or B? Spatial Statistics: Topic 3 21
22.
Variability (contd.)Coefficient of variation – CV – std. deviation as % of the mean:A better measure compared to std. dev. in case where samples have different means. E.g. * Plant X (ton/ha) = {1.2, 1.4, 2.6, 2.7, 3.9} * Plant Y (ton/ha) = {1.4, 1.5, 2.1, 3.2, 3.9} Spatial Statistics: Topic 3 22
23.
Variability (cont.) Yield Calculate CV for bothFarm (ton/ha) species. No. Species Species X Y CVx = (1.2/2.36) x 100 1 1.2 1.4 2 1.4 1.5 3 2.6 2.1 = 50.97% 4 2.7 3.2 CVy = (1.2/2.42) x 100 5 3.9 3.9Mean 2.36 2.42 = 49.46% ∴ Species X is a little moreVar. 1.20 1.20 variable than species Y Spatial Statistics: Topic 3 23
24.
Variability (cont.)Std. dev. of a frequency distribution E.g. age distribution of second-home buyers (SHB): Spatial Statistics: Topic 3 24
25.
Probability distribution Logical probability: If there 20 lecturers, the probability that A becomes a professor is: p = 1/20 = 0.05Experiential probability: Out of 100 births, half of them were girls (p=0.5), as the number increased to 1,000, two-third were girls (p=0.67) but from a record of 10,000 new-born babies, three-quarter were girls (p=0.75)Subjective probability: The probability of a drug addict recovering from addiction is 50:50 General rule: No. of times event X occurs Pr (event X) = ------------------------------------- Total number of occurrences Probability of certain event X to occur has a specific form of distribution Spatial Statistics: Topic 3 25
26.
Probability Distribution Classical example of tossingDice2 Dice1 1 2 3 4 5 6 1 2 3 4 5 6 7 2 3 4 5 6 7 8 3 4 5 6 7 8 9 4 5 6 7 8 9 10 5 6 7 8 9 10 11 6 7 8 9 10 11 12 What is the distribution of the sum of tosses? Spatial Statistics: Topic 3 26
27.
Probability Distribution (contd.) Discrete variableValues of x are discrete (discontinuous)Sum of lengths of vertical bars Σp(X=x) = 1 all x Spatial Statistics: Topic 3 27
28.
Probability Distribution (cont.) Continuous variable Age Freq Prob. Mean = 39.5 36 3 0.02 Std. dev = 2.45 37 14 0.07 38 10 0.04 39 36 0.18 40 73 0.36 41 27 0.14 42 20 0.10 43 17 0.09 Total 200 1.00 Pr (Area under Pr (Area under = 1 curve) curve) = 1Age distribution of second-home buyers in Spatial Statistics: Topic 3 28 probability histogram
30.
Probability Distribution(cont.) Larger sampleAs larger and larger samples are drawn, the probability distribution is getting smootherTens of different types of Very large sample probability distribution: Z, t, F, gamma, etcMost important: normal distribution Spatial Statistics: Topic 3 30
31.
Normal Distribution - NDSalient features of ND: * Bell-shaped, symmetrical * Total area under curve = 1 * Area under curve between any two points = prob. of values in that range (shaded area) * Prob. of any exact value = 0 * Has a function of: μ = mean of variable x; σ = std. dev. of x; π = ratio of circumference of a circle to its diameter = 3.14; e = base of natural log = 2.71828. Spatial Statistics: Topic 3 31
32.
Normal Distribution - ND Population 2Population 1 σ2 σ1 µ1 µ2 * µ determines location * A larger population has while σ determines narrower base (smaller Spatial Statistics: Topic 3 32 shape of ND variance)
33.
Normal Distribution (cont.)* Has a mean µ and a variance σ2, i.e. X ∼ N(µ, σ2 )* Has the following distribution of observation: “Home-buyers example…” Mean age = 39.3 Std. dev = 2.42 Spatial Statistics: Topic 3 33
34.
Standard Normal Distribution (SND)Since different populations have different µ and σ (thus, locations and shapes of distribution), they have to be standardised.Most common standardisation: standard normal distribution (SND) or called Z-distribution φ(X=x) is given by area under curveHas no standard algebraic method of integration → Z ~ N(0,1)To transform f(x) into f(z): x-µ Z = ------- ~ N(0, 1) Spatial Statistics: Topic 3 34 σ
35.
Z-DistributionProbability is such a way that: * Approx. 68% -1< z <1 * Approx. 95% -1.96 < z < 1.96 * Approx. 99% -2.58 < z < 2.58 Spatial Statistics: Topic 3 35
36.
Z-distribution (cont.)When X= μ, Z = 0, i.e.When X = μ + σ, Z = 1When X = μ + 2σ, Z = 2When X = μ + 3σ, Z = 3 and so on.It can be proven that P(X1 <X< Xk) = P(Z1 <Z< Zk)SND shows the probability to the right of any particular value of Z. Spatial Statistics: Topic 3 36
37.
Normal distribution…QuestionsA study found that the mean age, A of second-home buyers in Johor Bahruis 39.3 years old with a variance of RM 2.45.Assuming normality, how sureare you that the mean age is: (a) ≥ 40 years old; (b) 39 to 42 years old?Answer (a): P(A ≥ 40) = P[Z ≥ (40 – 39.3)/2.4] = P(Z ≥ 0.2917≈ 0.3000) = 0.3821 (b) P(39 ≤ A ≤ 42) = P(A ≥ 39) – P(A ≥ 42) = 0.45224 – P[A ≥ (42-39.3)/2.4] = 0.45224 – P(A ≥ 1.125) = 0.45224 – 0.12924 = 0.3230 Use Z-table! Spatial Statistics: Topic 3 37 Always remember: to convert to SND, subtract the mean and divide by the std. dev.
38.
“Student’s t-Distribution”Similar to Z-distribution (bell-shaped, symmetrical)Has a function of where Γ = gamma distribution; v = n-1 = d.o.f; π = 3.147Flatter with thicker tailsDistributed with t∼(0,σ) and -∞ < t < +∞As n→∞ t∼(0,σ) → N(0,1)Probability calculation requires information on d.o.f. Spatial Statistics: Topic 3 38
39.
How Are t-dist. and Z-dist. Related? Using central limit theorem, ∼N(µ, σ2/n) willbecome z∼N(0, 1) as n→∞ ∴For a large sample, t-dist. of a variable or a parameter is given by:The interval of critical values for variable, x is: Spatial Statistics: Topic 3 39
40.
Skewness, m3 & Kurtosis, m4Skewness, m3 measures degree of symmetry of distributionKurtosis, m4 measures its degree of peaknessBoth are useful when comparing sample distributions with different Xi = indivudal sample shapes observation, =Useful in data analysis sample mean; σ = std. deviation; n = sample size Spatial Statistics: Topic 3 40
Views
Actions
Embeds 0
Report content