SlideShare a Scribd company logo
1 of 65
Techniques of Data Analysis

     Assoc. Prof. Dr. Abdul Hamid b. Hj. Mar Iman

                                                Director
                         Centre for Real Estate Studies
    Faculty of Engineering and Geoinformation Science
                         Universiti Tekbnologi Malaysia
                                          Skudai, Johor
Objectives

   Overall: Reinforce your understanding from the main
    lecture

   Specific:
    * Concepts of data analysis
    * Some data analysis techniques
    * Some tips for data analysis


    What I will not do:
    * To teach every bit and pieces of statistical analysis
      techniques
Data analysis – “The Concept”
 Approach to de-synthesizing data, informational,
 and/or factual elements to answer research
 questions

 Method  of putting together facts and figures
 to solve research problem

 Systematicprocess of utilizing data to address
 research questions

 Breaking down research issues through utilizing
 controlled data and factual information
Categories of data analysis
 Narrative  (e.g. laws, arts)
 Descriptive (e.g. social sciences)
 Statistical/mathematical (pure/applied sciences)
 Audio-Optical (e.g. telecommunication)
 Others


Most research analyses, arguably, adopt the first
three.

The second and third are, arguably, most popular
in pure, applied, and social sciences
Statistical Methods
 Something to do with “statistics”
 Statistics: “meaningful” quantities about a sample of
  objects, things, persons, events, phenomena, etc.
 Widely used in social sciences.
 Simple to complex issues. E.g.
 * correlation
 * anova
 * manova
 * regression
 * econometric modelling
 Two main categories:
 * Descriptive statistics
 * Inferential statistics
Descriptive statistics
 Use sample information to explain/make
  abstraction of population “phenomena”.
 Common “phenomena”:
 * Association (e.g. σ1,2.3 = 0.75)
*  Tendency (left-skew, right-skew)
 * Causal relationship (e.g. if X, then, Y)
 * Trend, pattern, dispersion, range
 Used in non-parametric analysis (e.g. chi-
  square, t-test, 2-way anova)
Examples of “abstraction” of phenomena
                                                                                                                                                         350,000
                                       200000                                                                                                            300,000




                                                                                                                                         No. of houses
                                                                                                                                                         250,000
                                       150000
                                                                                                                                                         200,000              1991
                                       100000                                                                                                            150,000              2000
                                                                                                                                                         100,000
                                         50000
                                                                                                                                                          50,000
                                                    0
                                                          1         2          3         4         5         6         7         8
                                                                                                                                                              0
                  Loan t o pr opert y sect or (RM       32635.8   38100.6    42468.1   47684.7   48408.2   61433.6   77255.7   97810.1




                                                                                                                                                               Kl u


                                                                                                                                                              M ggi
                                                                                                                                                             ta ng




                                                                                                                                                            Se tian

                                                                                                                                                                     at
                                                                                                                                                               rB t




                                                                                                                                                              Po ar
                                                                                                                                                                     ng
                                                                                                                                                            ho h a

                                                                                                                                                                      r
                  million)




                                                                                                                                                                   ah




                                                                                                                                                                    u



                                                                                                                                                                   m
                                                                                                                                                          Ko ua



                                                                                                                                                                  si
                                                                                                                                                                   n



                                                                                                                                                                 M
                                                                                                                                                         J o Pa




                                                                                                                                                                 n
                                                                                                                                                                ga
                                                                                                                                                                Ti

                                                                                                                                                                er
                  Demand f or shop shouses (unit s)      71719    73892      85843      95916    101107    117857    134864    86323




                                                                                                                                                             tu
                  Supply of shop houses (unit s)        85534     85821      90366     101508    111952    125334    143530    154179




                                                                                                                                                          Ba
                                                                                   Year (1990 - 1997)

                                                Trends in property loan, shop house dem and & supply                                                               District




                 14
                 12
Proportion (%)




                 10
                 8
                 6
                 4
                 2
                 0
                    4

                                  4




                                                                                 4




                                                                                                            4
                                                 4


                                                                   4




                                                                                              4




                                                                                                                          4
                                -1


                                               -2


                                                                 -3

                                                                               -4


                                                                                            -5

                                                                                                          -6


                                                                                                                        -7
                 0-

                             10


                                            20


                                                              30

                                                                            40


                                                                                         50

                                                                                                       60


                                                                                                                     70




                                                        Age Category (Years Old)
Examples of “abstraction” of phenomena
                               200



                                                                                                               50.00
                               180                                                                                                                                       %
                                                                                                                                                                     prediction




                                                                                   Distance from Rakaia (km)
                               160                                                                             40.00                                                   error
Price (RM/sq.ft. built area)




                               140                                                                             30.00                                                  100.00
                                                                                                                                                                      80.00
                                                                                                                                                                      60.00
                                                                                                                                                                      40.00
                               120                                                                             20.00                                                  20.00
                                                                                                                                                                      0.00
                                                                                                                                                                      -20.00
                               100                                                                             10.00                                                  -40.00
                                                                                                                                                                      -60.00
                                                                                                                                                                      -80.00
                               80                                                                                                                                     -100.00
                                 20    40        60         80         100   120                                       10.00 20.00   30.00   40.00   50.00   60.00
                                                                                                                               Distance from Ashurton (km)
                                            Demand (% sales success)
Inferential statistics
 Using  sample statistics to infer some
  “phenomena” of population parameters
 Common “phenomena”: cause-and-effect
  * One-way r/ship           Y = f(X)

  * Multi-directional r/ship          Y1 = f(Y2, X, e1)
                                      Y2 = f(Y1, Z, e2)
  * Recursive         Y1 = f(X, e1)
                         Y2 = f(Y1, Z, e2)


 Use   parametric analysis
Examples of relationship
                Dep=9t – 215.8


                                     Dep=7t – 192.6




                                             Coefficientsa


                                Unstandardized          Standardized
                                   Coefficients          Coefficients
        Model                    B         Std. Error       Beta         t        Sig.
        1       (Constant)    1993.108       239.632                     8.317       .000
                Tanah           -4.472          1.199           -.190   -3.728       .000
                Bangunan         6.938           .619            .705   11.209       .000
                Ansilari         4.393          1.807            .139    2.431       .017
                Umur           -27.893          6.108           -.241   -4.567       .000
                Flo_go          34.895        89.440             .020      .390      .697
          a. Dependent Variable: Nilaism
Which one to use?
 Nature of research
 * Descriptive in nature?
 * Attempts to “infer”, “predict”, find “cause-and-effect”,
   “influence”, “relationship”?
 * Is it both?
 Research design (incl. variables involved). E.g.
 Outputs/results expected
  * research issue
  * research questions
  * research hypotheses

    At post-graduate level research, failure to choose the correct data
    analysis technique is an almost sure ingredient for thesis failure.
Common mistakes in data analysis
   Wrong techniques. E.g.
 Issue                                                           Data analysis techniques
                                                      Wrong technique              Correct technique
 To study factors that “influence” visitors to   Likert scaling based on      Data tabulation based on
 come to a recreation site                       interviews                   open-ended questionnaire
                                                                              survey

 “Effects” of KLIA on the development of         Likert scaling based on      Descriptive analysis based
 Sepang                                          interviews                   on ex-ante post-ante
                                                                              experimental investigation

 Note: No way can Likert scaling show “cause-and-effect” phenomena!

   Infeasible techniques. E.g.
    How to design ex-ante effects of KLIA? Development
    occurs “before” and “after”! What is the control treatment?
    Further explanation!
   Abuse of statistics. E.g.
   Simply exclude a technique
Common mistakes (contd.) – “Abuse of statistics”
Issue                                                  Data analysis techniques
                                             Example of abuse           Correct technique
Measure the “influence” of a variable    Using partial correlation   Using a regression
on another                               (e.g. Spearman coeff.)      parameter
Finding the “relationship” between one   Multi-dimensional           Simple regression
variable with another                    scaling, Likert scaling     coefficient
To evaluate whether a model fits data    Using R2                    Many – a.o.t. Box-Cox
better than the other                                                χ2 test for model
                                                                     equivalence
To evaluate accuracy of “prediction”     Using R2 and/or F-value     Hold-out sample’s
                                         of a model                  MAPE
“Compare” whether a group is different Multi-dimensional             Many – a.o.t. two-way
from another                           scaling, Likert scaling       anova, χ2, Z test

To determine whether a group of          Multi-dimensional           Many – a.o.t. manova,
factors “significantly influence” the    scaling, Likert scaling     regression
observed phenomenon
How to avoid mistakes - Useful tips
 Crystalize   the research problem → operability of
  it!
 Read literature on data analysis techniques.
 Evaluate various techniques that can do similar
  things w.r.t. to research problem
 Know what a technique does and what it doesn’t
 Consult people, esp. supervisor
 Pilot-run the data and evaluate results
 Don’t do research??
Principles of analysis
 Goal of an analysis:
  * To explain cause-and-effect phenomena
  * To relate research with real-world event
  * To predict/forecast the real-world
    phenomena based on research
  * Finding answers to a particular problem
  * Making conclusions about real-world
 event
    based on the problem
  * Learning a lesson from the problem
Principles of analysis (contd.)

 Data can’t “talk”
 An analysis contains some aspects of scientific
 reasoning/argument:
 * Define
 * Interpret
 * Evaluate
 * Illustrate
 * Discuss
 * Explain
 * Clarify
 * Compare
 * Contrast
Principles of analysis (contd.)
 An  analysis must have four elements:
  * Data/information (what)
  * Scientific reasoning/argument (what?
     who? where? how? what happens?)
  * Finding (what results?)
  * Lesson/conclusion (so what? so how?
    therefore,…)
 Example
Principles of data analysis
 Basic guide to data analysis:
 * “Analyse” NOT “narrate”
 * Go back to research flowchart
 * Break down into research objectives and
   research questions
 * Identify phenomena to be investigated
 * Visualise the “expected” answers
 * Validate the answers with data
 * Don’t tell something not supported by
   data
Principles of data analysis (contd.)
Shoppers                                           Number
Male
  Old                                                   6
  Young                                                 4
Female
  Old                                                  10
  Young                                                15
More female shoppers than male shoppers
More young female shoppers than young male shoppers
Young male shoppers are not interested to shop at the shopping complex
Data analysis (contd.)
 When   analysing:
  * Be objective
  * Accurate
  * True
 Separate facts and opinion
 Avoid “wrong” reasoning/argument. E.g.
  mistakes in interpretation.
Introductory Statistics for Social Sciences



              Basic concepts
             Central tendency
                  Variability
                 Probability
            Statistical Modelling
Basic Concepts
 Population: the whole set of a “universe”
 Sample: a sub-set of a population
 Parameter: an unknown “fixed” value of population characteristic
 Statistic: a known/calculable value of sample characteristic
  representing that of the population. E.g.
  μ = mean of population,     = mean of sample

    Q: What is the mean price of houses in J.B.?
        A: RM 210,000
                                = 300,000         = 120,000
                            1
                                              2
                            SD              SST
                                = 210,000
                            3
                                             J.B. houses
                           DST
                                             μ=?
Basic Concepts (contd.)
 Randomness:     Many things occur by pure
  chances…rainfall, disease, birth, death,..
 Variability: Stochastic processes bring in
  them various different dimensions,
  characteristics, properties, features, etc.,
  in the population
 Statistical analysis methods have been
  developed to deal with these very nature
  of real world.
“Central Tendency”
Measure                       Advantages                        Disadvantages
Mean         ∗ Best known average                   ∗ Affected by extreme values
(Sum of                                             ∗ Can be absurd for discrete data
             ∗ Exactly calculable
all values
÷            ∗ Make use of all data                   (e.g. Family size = 4.5 person)
no. of       ∗ Useful for statistical analysis      ∗ Cannot be obtained graphically
values)

Median       ∗ Not influenced by extreme            ∗ Needs interpolation for group/
(middle        values                                 aggregate data (cumulative
value)
             ∗ Obtainable even if data                frequency curve)
               distribution unknown (e.g.           ∗ May not be characteristic of group
               group/aggregate data)                  when: (1) items are only few; (2)
             ∗ Unaffected by irregular class          distribution irregular
               width                                ∗ Very limited statistical use
             ∗ Unaffected by open-ended class


Mode         ∗ Unaffected by extreme values         ∗ Cannot be determined exactly in
(most                                                 group data
             ∗ Easy to obtain from histogram
frequent
value)       ∗ Determinable from only values        ∗ Very limited statistical use
               near the modal class
Central Tendency – “Mean”,
   For individual observations,      . E.g.
    X = {3,5,7,7,8,8,8,9,9,10,10,12}
        = 96 ; n = 12
   Thus,        = 96/12 = 8
   The above observations can be organised into a frequency
    table and mean calculated on the basis of frequencies
     x   3   5    7   8   9    10 12
     f   1   1    2   3   2    2   1
                                          = 96;    = 12
     Σf 3    5    14 24 18 20 12

Thus,            = 96/12 = 8
Central Tendency–“Mean of Grouped Data”
 House rental or prices in the PMR are frequently
 tabulated as a range of values. E.g.

 Rental (RM/month)        135-140   140-145   145-150   150-155   155-160

 Mid-point value (x)      137.5     142.5     147.5     152.5     157.5

 Number of Taman (f)      5         9         6         2         1

                       fx 687.5     1282.5    885.0     305.0     157.5

 What is the mean rental across the areas?
   = 23;      = 3317.5
 Thus,        = 3317.5/23 = 144.24
Central Tendency – “Median”
   Let say house rentals in a particular town are tabulated as
    follows:
       Rental (RM/month)        130-135      135-140     140-145 155-50       150-155
       Number of Taman (f)      3            5           9         6          2
       Rental (RM/month)        >135         > 140       > 145     > 150      > 155
       Cumulative frequency     3            8           1         23         25


   Calculation of “median” rental needs a graphical aids→
    1. Median = (n+1)/2 = (25+1)/2 =13th.        5. Taman 13th. is 5th. out of the 9
       Taman
                                                     Taman
    2. (i.e. between 10 – 15 points on the
       vertical axis of ogive).                  6. The interval width is 5

    3. Corresponds to RM 140-                    7. Therefore, the median rental can
       145/month on the horizontal axis              be calculated as:
    4. There are (17-8) = 9 Taman in the             140 + (5/9 x 5) = RM 142.8
       range of RM 140-145/month
Central Tendency – “Median” (contd.)
Central Tendency – “Quartiles” (contd.)


                       Upper quartile = ¾(n+1) = 19.5th.
                       Taman
                       UQ = 145 + (3/7 x 5) = RM
                       147.1/month
                       Lower quartile = (n+1)/4 = 26/4 =
                       6.5 th. Taman
                       LQ = 135 + (3.5/5 x 5) =
                       RM138.5/month
                       Inter-quartile = UQ – LQ = 147.1
                       – 138.5 = 8.6th. Taman
                       IQ = 138.5 + (4/5 x 5) = RM
                       142.5/month
“Variability”
 Indicates dispersion, spread, variation, deviation
 For single population or sample data:




 where σ2 and s2 = population and sample variance respectively, xi =
 individual observations, μ = population mean, = sample mean, and n
 = total number of individual observations.
 The   square roots are:




   standard deviation            standard deviation
“Variability” (contd.)
 Why “measure of dispersion” important?
 Consider returns from two categories of shares:


 * Shares A (%) = {1.8, 1.9, 2.0, 2.1, 3.6}
 * Shares B (%) = {1.0, 1.5, 2.0, 3.0, 3.9}

   Mean A = mean B = 2.28%
   But, different variability!
   Var(A) = 0.557, Var(B) = 1.367

  * Would you invest in category A shares or
    category B shares?
“Variability” (contd.)
 Coefficient
            of variation – COV – std. deviation as
  % of the mean:




 Could
      be a better measure compared to std. dev.
  COV(A) = 32.73%, COV(B) = 51.28%
“Variability” (contd.)
 Std.   dev. of a frequency distribution
  The following table shows the age distribution of second-time home buyers:




    x^
“Probability Distribution”
 Defined as of probability density function (pdf).
 Many types: Z, t, F, gamma, etc.
 “God-given” nature of the real world event.
 General form:                         (continuous)

                                         (discrete)

 E.g.
“Probability Distribution” (contd.)


Dice2
        Dice1
                  1      2      3      4      5       6
    1             2      3      4      5      6        7
    2             3      4      5      6      7        8
    3             4      5      6      7      8        9
    4             5      6      7      8      9       10
    5             6      7      8      9     10       11
    6             7      8      9     10     11       12
“Probability Distribution” (contd.)


                 Discrete values              Discrete values




Values of x are discrete (discontinuous)

Sum of lengths of vertical bars Σp(X=x) = 1
                              all x
“Probability Distribution” (contd.)
    8                                                   ▪ Many real world phenomena
                                                          take a form of continuous
                                                          random variable

    6                                                   ▪ Can take any values between
                                                          two limits (e.g. income, age,
                                                          weight, price, rental, etc.)

    4
F
n
u
q
y
c
e
r




    2




                                                   Mean = 4.0628
                                                   Std. Dev. = 1.70319
                                                   N = 32
    0
    2.00   3.00      4.00          5.00   6.00   7.00
                     Rental (RM/sq.ft.)
“Probability Distribution” (contd.)




P(Rental = RM 8) = 0      P(Rental < RM 3.00) =   0.206


P(Rental < RM7) = 0.972   P(Rental ≥ RM 4.00) = 0.544
“Probability Distribution” (contd.)
 Ideal   distribution of such phenomena:




   * Bell-shaped, symmetrical
                                μ = mean of variable x
                                σ = std. dev. Of x
   * Has a function of
                                π = ratio of circumference of a
                                   circle to its diameter = 3.14
                                e = base of natural log = 2.71828
“Probability distribution”




μ ± 1σ = ?           = ____% from total observation
μ ± 2σ = ?           = ____% from total observation
μ ± 3σ = ?           = ____% from total observation
“Probability distribution”
* Has the following distribution of observation
“Probability distribution”
 There are various other types and/or shapes of
 distribution. E.g.




                                    Note: Σp(AGE=age) ≠ 1
                                    How to turn this graph into
                                    a probability distribution
                                    function (p.d.f.)?




 Not   “ideally” shaped like the previous one
“Z-Distribution”
   φ(X=x) is given by area under curve
   Has no standard algebraic method of integration → Z ~ N(0,1)
   It is called “normal distribution” (ND)
   Standard reference/approximation of other distributions. Since there
    are various f(x) forming NDs, SND is needed
   To transform f(x) into f(z):
           x-µ
     Z = --------- ~ N(0, 1)
            σ
               160 –155
     E.g. Z = ------------- = 0.926
                   5.4

   Probability is such a way that:
    * Approx. 68% -1< z <1
    * Approx. 95% -1.96 < z < 1.96
    * Approx. 99% -2.58 < z < 2.58
“Z-distribution” (contd.)

 When   X= μ, Z = 0, i.e.



 When   X = μ + σ, Z = 1
 When X = μ + 2σ, Z = 2
 When X = μ + 3σ, Z = 3 and so on.
 It can be proven that P(X1 <X< Xk) = P(Z1 <Z< Zk)
 SND   shows the probability to the right of any
  particular value of Z.
 Example
Normal distribution…Questions
Your sample found that the mean price of “affordable” homes in Johor
Bahru, Y, is RM 155,000 with a variance of RM 3.8x107. On the basis of a
normality assumption, how sure are you that:

(a)   The mean price is really ≤ RM 160,000
(b)   The mean price is between RM 145,000 and 160,000

 Answer (a):
                                160,000 -155,000
P(Y ≤ 160,000) = P(Z ≤ ---------------------------)
                 = P(Z ≤ 0.811) √3.8x10
                                          7


                 = 0.1867
Using Z-table , the required probability is:
      1-0.1867 = 0.8133



Always remember: to convert to SND, subtract the mean and divide by the std. dev.
Normal distribution…Questions
Answer (b):

     X1 - μ    145,000 – 155,000
Z1 = ------ = ---------------- = -1.622
        σ           √3.8x107

      X2 - μ   160,000 – 155,000
Z2 = ------ = ---------------- = 0.811
        σ           √3.8x10    7




P(Z1<-1.622)=0.0455; P(Z2>0.811)=0.1867
∴P(145,000<Z<160,000)
  = P(1-(0.0455+0.1867)
  = 0.7678
Normal distribution…Questions
You are told by a property consultant that the
average rental for a shop house in Johor Bahru is
RM 3.20 per sq. After searching, you discovered
the following rental data:

2.20, 3.00, 2.00, 2.50, 3.50,3.20, 2.60, 2.00,
3.10, 2.70

What is the probability that the rental is greater
than RM 3.00?
“Student’s t-Distribution”


 Similar to Z-distribution:
 * t(0,σ) but σn→∞→1
 * -∞ < t < +∞
 * Flatter with thicker tails
 * As n→∞ t(0,σ) → N(0,1)
 * Has a function of
   where Γ=gamma distribution; v=n-1=d.o.f; π=3.147
 * Probability calculation requires information on
   d.o.f.
“Student’s t-Distribution”


 Given   n independent measurements, xi, let



 where μ is the population mean, is the sample
 mean, and s is the estimator for population
 standard deviation.

 Distributionof the random variable t which is
  (very loosely) the "best" that we can do not
  knowing σ.
“Student’s t-Distribution”


 Student's    t-distribution can be derived by:

  * transforming Student's z-distribution using



  * defining

 The  resulting probability and cumulative
  distribution functions are:
“Student’s t-Distribution”

         fr(t) =



                  =



        Fr(t) =


             =


              =

    where r ≡ n-1 is the number of degrees of freedom, -∞<t<∞,Γ(t) is the gamma function,
    B(a,b) is the beta function, and I(z;a,b) is the regularized beta function defined by
                                           
Forms of “statistical” relationship
 Correlation
 Contingency
 Cause-and-effect
 * Causal
 * Feedback
 * Multi-directional
 * Recursive
 The last two categories are normally dealt with
  through regression
Correlation
   “Co-exist”.E.g.
    * left shoe & right shoe, sleep & lying down, food & drink
   Indicate “some” co-existence relationship. E.g.
    * Linearly associated (-ve or +ve)
                                                    Formula:
    * Co-dependent, independent
   But, nothing to do with C-A-E r/ship!
    Example: After a field survey, you have the following
    data on the distance to work and distance to the city
    of residents in J.B. area. Interpret the results?
Contingency
   A form of “conditional” co-existence:
    * If X, then, NOT Y; if Y, then, NOT X
    * If X, then, ALSO Y
    * E.g.
       + if they choose to live close to workplace,
         then, they will stay away from city
       + if they choose to live close to city, then, they
         will stay away from workplace
       + they will stay close to both workplace and city
Correlation and regression – matrix approach
Correlation and regression – matrix approach
Correlation and regression – matrix approach
Correlation and regression – matrix approach
Correlation and regression – matrix approach
Test yourselves!

Q1: Calculate the min and std. variance of the following data:

PRICE - RM ‘000             130 137 128 390 140 241 342 143
SQ. M OF FLOOR              135 140 100 360 175 270 200 170


Q2: Calculate the mean price of the following low-cost houses, in various
localities across the country:


 PRICE - RM ‘000 (x)         36    37   38    39    40   41      42   43

 NO. OF LOCALITIES (f)       3     14   10    36    73   27      20   17
Test yourselves!
Q3: From a sample information, a population of housing
estate is believed have a “normal” distribution of X ~ (155,
45). What is the general adjustment to obtain a Standard
Normal Distribution of this population?

Q4: Consider the following ROI for two types of investment:

A: 3.6, 4.6, 4.6, 5.2, 4.2, 6.5
B: 3.3, 3.4, 4.2, 5.5, 5.8, 6.8

Decide which investment you would choose.
Test yourselves!




Q5: Find:
φ(AGE > “30-34”)
φ(AGE ≤ 20-24)
φ( “35-39”≤ AGE < “50-54”)
Test yourselves!
Q6: You are asked by a property marketing manager to ascertain
   whether
or not distance to work and distance to the city are “equally” important
factors influencing people’s choice of house location.

You are given the following data for the purpose of testing:

Explore the data as follows:
• Create histograms for both distances. Comment on the shape of the
  histograms. What is you conclusion?
• Construct scatter diagram of both distances. Comment on the output.
• Explore the data and give some analysis.
• Set a hypothesis that means of both distances are the same. Make
  your conclusion.
Test yourselves! (contd.)

Q7: From your initial investigation, you belief that tenants of
“low-quality” housing choose to rent particular flat units just
to find shelters. In this context ,these groups of people do
not pay much attention to pertinent aspects of “quality
life” such as accessibility, good surrounding, security, and
physical facilities in the living areas.

(a) Set your research design and data analysis procedure to address
the research issue
(b) Test your hypothesis that low-income tenants do not perceive
   “quality life” to be important in paying their house rentals.
Thank you

More Related Content

What's hot

Sales Force Efficacy, Prague Seminar
Sales Force Efficacy, Prague SeminarSales Force Efficacy, Prague Seminar
Sales Force Efficacy, Prague Seminarsorinciuciuc
 
fiserv annual reports 2003
fiserv annual reports 2003fiserv annual reports 2003
fiserv annual reports 2003finance47
 
IPO Watch Europe T4 2011
IPO Watch Europe T4 2011IPO Watch Europe T4 2011
IPO Watch Europe T4 2011PwC France
 
1000 to 2000 sf financial core
1000 to 2000 sf   financial core1000 to 2000 sf   financial core
1000 to 2000 sf financial coreChris Fyvie
 
Media landscape updater 2011
Media landscape updater 2011Media landscape updater 2011
Media landscape updater 2011MediaDirectionOMD
 
Quepasa Corporation (NYSE Amex: QPSA) Q1 2012 Financial Results
Quepasa Corporation (NYSE Amex: QPSA) Q1 2012 Financial ResultsQuepasa Corporation (NYSE Amex: QPSA) Q1 2012 Financial Results
Quepasa Corporation (NYSE Amex: QPSA) Q1 2012 Financial ResultsMeetMe, Inc
 
143. Belmont Stakes
143. Belmont Stakes143. Belmont Stakes
143. Belmont Stakesracingportal
 
2007 Surrounding High Rise Utility Information Without Names
2007 Surrounding High Rise Utility Information Without Names2007 Surrounding High Rise Utility Information Without Names
2007 Surrounding High Rise Utility Information Without NamesKim Mitchell
 
Van koten economics experiments_ 2013.02.27
Van koten  economics experiments_ 2013.02.27Van koten  economics experiments_ 2013.02.27
Van koten economics experiments_ 2013.02.27Silvester Van Koten
 
PG&E Presentation to Kerntax 2013-02-22
PG&E Presentation to Kerntax   2013-02-22PG&E Presentation to Kerntax   2013-02-22
PG&E Presentation to Kerntax 2013-02-22Michael Turnipseed
 
Northern Virginia Loudoun County Housing Market
Northern Virginia Loudoun County Housing MarketNorthern Virginia Loudoun County Housing Market
Northern Virginia Loudoun County Housing MarketBetty Plashal
 
Jörg Mayer: Food price volatility - assessing potential market impacts
Jörg Mayer: Food price volatility - assessing potential market impactsJörg Mayer: Food price volatility - assessing potential market impacts
Jörg Mayer: Food price volatility - assessing potential market impactsfutureagricultures
 
Cumberland-Atlantic-counties-data-demographics-uez
Cumberland-Atlantic-counties-data-demographics-uezCumberland-Atlantic-counties-data-demographics-uez
Cumberland-Atlantic-counties-data-demographics-uezfianacone
 
Fringe eu procurement - sara piller
Fringe   eu procurement - sara pillerFringe   eu procurement - sara piller
Fringe eu procurement - sara pillerlgconf11
 
Extension Works for Djen Djen Port Protection
Extension Works for Djen Djen Port ProtectionExtension Works for Djen Djen Port Protection
Extension Works for Djen Djen Port Protectionlyesdz
 
parker hannifin annual 06
parker hannifin annual 06parker hannifin annual 06
parker hannifin annual 06finance25
 
Value of libraries - ANU Outsell persentation
Value of libraries - ANU Outsell persentationValue of libraries - ANU Outsell persentation
Value of libraries - ANU Outsell persentationRoxanne Missingham
 

What's hot (19)

Sales Force Efficacy, Prague Seminar
Sales Force Efficacy, Prague SeminarSales Force Efficacy, Prague Seminar
Sales Force Efficacy, Prague Seminar
 
fiserv annual reports 2003
fiserv annual reports 2003fiserv annual reports 2003
fiserv annual reports 2003
 
IPO Watch Europe T4 2011
IPO Watch Europe T4 2011IPO Watch Europe T4 2011
IPO Watch Europe T4 2011
 
1000 to 2000 sf financial core
1000 to 2000 sf   financial core1000 to 2000 sf   financial core
1000 to 2000 sf financial core
 
Media landscape updater 2011
Media landscape updater 2011Media landscape updater 2011
Media landscape updater 2011
 
Quepasa Corporation (NYSE Amex: QPSA) Q1 2012 Financial Results
Quepasa Corporation (NYSE Amex: QPSA) Q1 2012 Financial ResultsQuepasa Corporation (NYSE Amex: QPSA) Q1 2012 Financial Results
Quepasa Corporation (NYSE Amex: QPSA) Q1 2012 Financial Results
 
143. Belmont Stakes
143. Belmont Stakes143. Belmont Stakes
143. Belmont Stakes
 
2007 Surrounding High Rise Utility Information Without Names
2007 Surrounding High Rise Utility Information Without Names2007 Surrounding High Rise Utility Information Without Names
2007 Surrounding High Rise Utility Information Without Names
 
Van koten economics experiments_ 2013.02.27
Van koten  economics experiments_ 2013.02.27Van koten  economics experiments_ 2013.02.27
Van koten economics experiments_ 2013.02.27
 
PG&E Presentation to Kerntax 2013-02-22
PG&E Presentation to Kerntax   2013-02-22PG&E Presentation to Kerntax   2013-02-22
PG&E Presentation to Kerntax 2013-02-22
 
Northern Virginia Loudoun County Housing Market
Northern Virginia Loudoun County Housing MarketNorthern Virginia Loudoun County Housing Market
Northern Virginia Loudoun County Housing Market
 
Jörg Mayer: Food price volatility - assessing potential market impacts
Jörg Mayer: Food price volatility - assessing potential market impactsJörg Mayer: Food price volatility - assessing potential market impacts
Jörg Mayer: Food price volatility - assessing potential market impacts
 
Q3 Presentation
Q3 PresentationQ3 Presentation
Q3 Presentation
 
Cumberland-Atlantic-counties-data-demographics-uez
Cumberland-Atlantic-counties-data-demographics-uezCumberland-Atlantic-counties-data-demographics-uez
Cumberland-Atlantic-counties-data-demographics-uez
 
Fringe eu procurement - sara piller
Fringe   eu procurement - sara pillerFringe   eu procurement - sara piller
Fringe eu procurement - sara piller
 
Extension Works for Djen Djen Port Protection
Extension Works for Djen Djen Port ProtectionExtension Works for Djen Djen Port Protection
Extension Works for Djen Djen Port Protection
 
Investor Pitch Mimir
Investor Pitch MimirInvestor Pitch Mimir
Investor Pitch Mimir
 
parker hannifin annual 06
parker hannifin annual 06parker hannifin annual 06
parker hannifin annual 06
 
Value of libraries - ANU Outsell persentation
Value of libraries - ANU Outsell persentationValue of libraries - ANU Outsell persentation
Value of libraries - ANU Outsell persentation
 

Viewers also liked

The Near Future of CSS
The Near Future of CSSThe Near Future of CSS
The Near Future of CSSRachel Andrew
 
How to Battle Bad Reviews
How to Battle Bad ReviewsHow to Battle Bad Reviews
How to Battle Bad ReviewsGlassdoor
 
Classroom Management Tips for Kids and Adolescents
Classroom Management Tips for Kids and AdolescentsClassroom Management Tips for Kids and Adolescents
Classroom Management Tips for Kids and AdolescentsShelly Sanchez Terrell
 
The Buyer's Journey - by Chris Lema
The Buyer's Journey - by Chris LemaThe Buyer's Journey - by Chris Lema
The Buyer's Journey - by Chris LemaChris Lema
 
The Presentation Come-Back Kid
The Presentation Come-Back KidThe Presentation Come-Back Kid
The Presentation Come-Back KidEthos3
 

Viewers also liked (7)

Data analysis
Data analysisData analysis
Data analysis
 
Data Analysis
Data AnalysisData Analysis
Data Analysis
 
The Near Future of CSS
The Near Future of CSSThe Near Future of CSS
The Near Future of CSS
 
How to Battle Bad Reviews
How to Battle Bad ReviewsHow to Battle Bad Reviews
How to Battle Bad Reviews
 
Classroom Management Tips for Kids and Adolescents
Classroom Management Tips for Kids and AdolescentsClassroom Management Tips for Kids and Adolescents
Classroom Management Tips for Kids and Adolescents
 
The Buyer's Journey - by Chris Lema
The Buyer's Journey - by Chris LemaThe Buyer's Journey - by Chris Lema
The Buyer's Journey - by Chris Lema
 
The Presentation Come-Back Kid
The Presentation Come-Back KidThe Presentation Come-Back Kid
The Presentation Come-Back Kid
 

Similar to Data analysis

3 Users Walk Into a Bar
3 Users Walk Into a Bar3 Users Walk Into a Bar
3 Users Walk Into a BarVigLink
 
EU Experience Climate Change Challenges & Opportunities
EU Experience   Climate Change Challenges & OpportunitiesEU Experience   Climate Change Challenges & Opportunities
EU Experience Climate Change Challenges & OpportunitiesNicolasbruxelles
 
Poster presentation
Poster presentationPoster presentation
Poster presentationredsys
 
Hybrid User Forum survey Results
Hybrid User Forum survey ResultsHybrid User Forum survey Results
Hybrid User Forum survey Resultshybriduserforum
 
If chemistry workbook ch099 a
If chemistry workbook ch099 aIf chemistry workbook ch099 a
If chemistry workbook ch099 aJulia vbvvvhgcv
 
Drug Use in America
Drug Use in AmericaDrug Use in America
Drug Use in AmericaSam Beal
 
The Changing Battlefield for Freedom Online
The Changing Battlefield for Freedom OnlineThe Changing Battlefield for Freedom Online
The Changing Battlefield for Freedom OnlineTim Hwang
 
It's Hard Out There For A Geek
It's Hard Out There For A GeekIt's Hard Out There For A Geek
It's Hard Out There For A Geekguest20e406
 
ACG European Capital Tour: Investing pitfalls / lessons learned and big succe...
ACG European Capital Tour: Investing pitfalls / lessons learned and big succe...ACG European Capital Tour: Investing pitfalls / lessons learned and big succe...
ACG European Capital Tour: Investing pitfalls / lessons learned and big succe...ACGEU
 
CA coordination in Zimbabwe. through the Zimbabwe CA taskforce ZWCATF. Michae...
CA coordination in Zimbabwe. through the Zimbabwe CA taskforce ZWCATF. Michae...CA coordination in Zimbabwe. through the Zimbabwe CA taskforce ZWCATF. Michae...
CA coordination in Zimbabwe. through the Zimbabwe CA taskforce ZWCATF. Michae...Joanna Hicks
 
Solar Space Heating in New Construction
Solar Space Heating in New ConstructionSolar Space Heating in New Construction
Solar Space Heating in New ConstructionReVision Energy
 
NYC Startups SVS5
NYC Startups SVS5NYC Startups SVS5
NYC Startups SVS501Booster
 
MEMS Panel - GSA &amp; IET Forum 2009
MEMS Panel - GSA &amp; IET Forum 2009MEMS Panel - GSA &amp; IET Forum 2009
MEMS Panel - GSA &amp; IET Forum 2009kpillans
 

Similar to Data analysis (20)

3 Users Walk Into a Bar
3 Users Walk Into a Bar3 Users Walk Into a Bar
3 Users Walk Into a Bar
 
Wilson Euec 2010
Wilson Euec 2010Wilson Euec 2010
Wilson Euec 2010
 
EU Experience Climate Change Challenges & Opportunities
EU Experience   Climate Change Challenges & OpportunitiesEU Experience   Climate Change Challenges & Opportunities
EU Experience Climate Change Challenges & Opportunities
 
Assainissement
AssainissementAssainissement
Assainissement
 
Aers 2010
Aers 2010Aers 2010
Aers 2010
 
Poster presentation
Poster presentationPoster presentation
Poster presentation
 
Hybrid User Forum survey Results
Hybrid User Forum survey ResultsHybrid User Forum survey Results
Hybrid User Forum survey Results
 
Seminar Saham
Seminar SahamSeminar Saham
Seminar Saham
 
237 valeof tiersssp_june07dl
237 valeof tiersssp_june07dl237 valeof tiersssp_june07dl
237 valeof tiersssp_june07dl
 
If chemistry workbook ch099 a
If chemistry workbook ch099 aIf chemistry workbook ch099 a
If chemistry workbook ch099 a
 
Drug Use in America
Drug Use in AmericaDrug Use in America
Drug Use in America
 
Ch099 a ch02-if-wkshts
Ch099 a ch02-if-wkshtsCh099 a ch02-if-wkshts
Ch099 a ch02-if-wkshts
 
The Changing Battlefield for Freedom Online
The Changing Battlefield for Freedom OnlineThe Changing Battlefield for Freedom Online
The Changing Battlefield for Freedom Online
 
It's Hard Out There For A Geek
It's Hard Out There For A GeekIt's Hard Out There For A Geek
It's Hard Out There For A Geek
 
ACG European Capital Tour: Investing pitfalls / lessons learned and big succe...
ACG European Capital Tour: Investing pitfalls / lessons learned and big succe...ACG European Capital Tour: Investing pitfalls / lessons learned and big succe...
ACG European Capital Tour: Investing pitfalls / lessons learned and big succe...
 
CA coordination in Zimbabwe. through the Zimbabwe CA taskforce ZWCATF. Michae...
CA coordination in Zimbabwe. through the Zimbabwe CA taskforce ZWCATF. Michae...CA coordination in Zimbabwe. through the Zimbabwe CA taskforce ZWCATF. Michae...
CA coordination in Zimbabwe. through the Zimbabwe CA taskforce ZWCATF. Michae...
 
Solar Space Heating in New Construction
Solar Space Heating in New ConstructionSolar Space Heating in New Construction
Solar Space Heating in New Construction
 
NYC Startups SVS5
NYC Startups SVS5NYC Startups SVS5
NYC Startups SVS5
 
MEMS Panel - GSA &amp; IET Forum 2009
MEMS Panel - GSA &amp; IET Forum 2009MEMS Panel - GSA &amp; IET Forum 2009
MEMS Panel - GSA &amp; IET Forum 2009
 
SunGard Cloud mini Brochure
SunGard Cloud mini BrochureSunGard Cloud mini Brochure
SunGard Cloud mini Brochure
 

Data analysis

  • 1. Techniques of Data Analysis Assoc. Prof. Dr. Abdul Hamid b. Hj. Mar Iman Director Centre for Real Estate Studies Faculty of Engineering and Geoinformation Science Universiti Tekbnologi Malaysia Skudai, Johor
  • 2. Objectives  Overall: Reinforce your understanding from the main lecture  Specific: * Concepts of data analysis * Some data analysis techniques * Some tips for data analysis  What I will not do: * To teach every bit and pieces of statistical analysis techniques
  • 3. Data analysis – “The Concept”  Approach to de-synthesizing data, informational, and/or factual elements to answer research questions  Method of putting together facts and figures to solve research problem  Systematicprocess of utilizing data to address research questions  Breaking down research issues through utilizing controlled data and factual information
  • 4. Categories of data analysis  Narrative (e.g. laws, arts)  Descriptive (e.g. social sciences)  Statistical/mathematical (pure/applied sciences)  Audio-Optical (e.g. telecommunication)  Others Most research analyses, arguably, adopt the first three. The second and third are, arguably, most popular in pure, applied, and social sciences
  • 5. Statistical Methods  Something to do with “statistics”  Statistics: “meaningful” quantities about a sample of objects, things, persons, events, phenomena, etc.  Widely used in social sciences.  Simple to complex issues. E.g. * correlation * anova * manova * regression * econometric modelling  Two main categories: * Descriptive statistics * Inferential statistics
  • 6. Descriptive statistics  Use sample information to explain/make abstraction of population “phenomena”.  Common “phenomena”:  * Association (e.g. σ1,2.3 = 0.75) * Tendency (left-skew, right-skew)  * Causal relationship (e.g. if X, then, Y)  * Trend, pattern, dispersion, range  Used in non-parametric analysis (e.g. chi- square, t-test, 2-way anova)
  • 7. Examples of “abstraction” of phenomena 350,000 200000 300,000 No. of houses 250,000 150000 200,000 1991 100000 150,000 2000 100,000 50000 50,000 0 1 2 3 4 5 6 7 8 0 Loan t o pr opert y sect or (RM 32635.8 38100.6 42468.1 47684.7 48408.2 61433.6 77255.7 97810.1 Kl u M ggi ta ng Se tian at rB t Po ar ng ho h a r million) ah u m Ko ua si n M J o Pa n ga Ti er Demand f or shop shouses (unit s) 71719 73892 85843 95916 101107 117857 134864 86323 tu Supply of shop houses (unit s) 85534 85821 90366 101508 111952 125334 143530 154179 Ba Year (1990 - 1997) Trends in property loan, shop house dem and & supply District 14 12 Proportion (%) 10 8 6 4 2 0 4 4 4 4 4 4 4 4 -1 -2 -3 -4 -5 -6 -7 0- 10 20 30 40 50 60 70 Age Category (Years Old)
  • 8. Examples of “abstraction” of phenomena 200 50.00 180 % prediction Distance from Rakaia (km) 160 40.00 error Price (RM/sq.ft. built area) 140 30.00 100.00 80.00 60.00 40.00 120 20.00 20.00 0.00 -20.00 100 10.00 -40.00 -60.00 -80.00 80 -100.00 20 40 60 80 100 120 10.00 20.00 30.00 40.00 50.00 60.00 Distance from Ashurton (km) Demand (% sales success)
  • 9. Inferential statistics  Using sample statistics to infer some “phenomena” of population parameters  Common “phenomena”: cause-and-effect * One-way r/ship Y = f(X) * Multi-directional r/ship Y1 = f(Y2, X, e1) Y2 = f(Y1, Z, e2) * Recursive Y1 = f(X, e1) Y2 = f(Y1, Z, e2)  Use parametric analysis
  • 10. Examples of relationship Dep=9t – 215.8 Dep=7t – 192.6 Coefficientsa Unstandardized Standardized Coefficients Coefficients Model B Std. Error Beta t Sig. 1 (Constant) 1993.108 239.632 8.317 .000 Tanah -4.472 1.199 -.190 -3.728 .000 Bangunan 6.938 .619 .705 11.209 .000 Ansilari 4.393 1.807 .139 2.431 .017 Umur -27.893 6.108 -.241 -4.567 .000 Flo_go 34.895 89.440 .020 .390 .697 a. Dependent Variable: Nilaism
  • 11. Which one to use?  Nature of research * Descriptive in nature? * Attempts to “infer”, “predict”, find “cause-and-effect”, “influence”, “relationship”? * Is it both?  Research design (incl. variables involved). E.g.  Outputs/results expected * research issue * research questions * research hypotheses At post-graduate level research, failure to choose the correct data analysis technique is an almost sure ingredient for thesis failure.
  • 12. Common mistakes in data analysis  Wrong techniques. E.g. Issue Data analysis techniques Wrong technique Correct technique To study factors that “influence” visitors to Likert scaling based on Data tabulation based on come to a recreation site interviews open-ended questionnaire survey “Effects” of KLIA on the development of Likert scaling based on Descriptive analysis based Sepang interviews on ex-ante post-ante experimental investigation Note: No way can Likert scaling show “cause-and-effect” phenomena!  Infeasible techniques. E.g. How to design ex-ante effects of KLIA? Development occurs “before” and “after”! What is the control treatment? Further explanation!  Abuse of statistics. E.g.  Simply exclude a technique
  • 13. Common mistakes (contd.) – “Abuse of statistics” Issue Data analysis techniques Example of abuse Correct technique Measure the “influence” of a variable Using partial correlation Using a regression on another (e.g. Spearman coeff.) parameter Finding the “relationship” between one Multi-dimensional Simple regression variable with another scaling, Likert scaling coefficient To evaluate whether a model fits data Using R2 Many – a.o.t. Box-Cox better than the other χ2 test for model equivalence To evaluate accuracy of “prediction” Using R2 and/or F-value Hold-out sample’s of a model MAPE “Compare” whether a group is different Multi-dimensional Many – a.o.t. two-way from another scaling, Likert scaling anova, χ2, Z test To determine whether a group of Multi-dimensional Many – a.o.t. manova, factors “significantly influence” the scaling, Likert scaling regression observed phenomenon
  • 14. How to avoid mistakes - Useful tips  Crystalize the research problem → operability of it!  Read literature on data analysis techniques.  Evaluate various techniques that can do similar things w.r.t. to research problem  Know what a technique does and what it doesn’t  Consult people, esp. supervisor  Pilot-run the data and evaluate results  Don’t do research??
  • 15. Principles of analysis  Goal of an analysis: * To explain cause-and-effect phenomena * To relate research with real-world event * To predict/forecast the real-world phenomena based on research * Finding answers to a particular problem * Making conclusions about real-world event based on the problem * Learning a lesson from the problem
  • 16. Principles of analysis (contd.)  Data can’t “talk”  An analysis contains some aspects of scientific reasoning/argument: * Define * Interpret * Evaluate * Illustrate * Discuss * Explain * Clarify * Compare * Contrast
  • 17. Principles of analysis (contd.)  An analysis must have four elements: * Data/information (what) * Scientific reasoning/argument (what? who? where? how? what happens?) * Finding (what results?) * Lesson/conclusion (so what? so how? therefore,…)  Example
  • 18. Principles of data analysis  Basic guide to data analysis: * “Analyse” NOT “narrate” * Go back to research flowchart * Break down into research objectives and research questions * Identify phenomena to be investigated * Visualise the “expected” answers * Validate the answers with data * Don’t tell something not supported by data
  • 19. Principles of data analysis (contd.) Shoppers Number Male Old 6 Young 4 Female Old 10 Young 15 More female shoppers than male shoppers More young female shoppers than young male shoppers Young male shoppers are not interested to shop at the shopping complex
  • 20. Data analysis (contd.)  When analysing: * Be objective * Accurate * True  Separate facts and opinion  Avoid “wrong” reasoning/argument. E.g. mistakes in interpretation.
  • 21. Introductory Statistics for Social Sciences Basic concepts Central tendency Variability Probability Statistical Modelling
  • 22. Basic Concepts  Population: the whole set of a “universe”  Sample: a sub-set of a population  Parameter: an unknown “fixed” value of population characteristic  Statistic: a known/calculable value of sample characteristic representing that of the population. E.g. μ = mean of population, = mean of sample Q: What is the mean price of houses in J.B.? A: RM 210,000 = 300,000 = 120,000 1 2 SD SST = 210,000 3 J.B. houses DST μ=?
  • 23. Basic Concepts (contd.)  Randomness: Many things occur by pure chances…rainfall, disease, birth, death,..  Variability: Stochastic processes bring in them various different dimensions, characteristics, properties, features, etc., in the population  Statistical analysis methods have been developed to deal with these very nature of real world.
  • 24. “Central Tendency” Measure Advantages Disadvantages Mean ∗ Best known average ∗ Affected by extreme values (Sum of ∗ Can be absurd for discrete data ∗ Exactly calculable all values ÷ ∗ Make use of all data (e.g. Family size = 4.5 person) no. of ∗ Useful for statistical analysis ∗ Cannot be obtained graphically values) Median ∗ Not influenced by extreme ∗ Needs interpolation for group/ (middle values aggregate data (cumulative value) ∗ Obtainable even if data frequency curve) distribution unknown (e.g. ∗ May not be characteristic of group group/aggregate data) when: (1) items are only few; (2) ∗ Unaffected by irregular class distribution irregular width ∗ Very limited statistical use ∗ Unaffected by open-ended class Mode ∗ Unaffected by extreme values ∗ Cannot be determined exactly in (most group data ∗ Easy to obtain from histogram frequent value) ∗ Determinable from only values ∗ Very limited statistical use near the modal class
  • 25. Central Tendency – “Mean”,  For individual observations, . E.g. X = {3,5,7,7,8,8,8,9,9,10,10,12} = 96 ; n = 12  Thus, = 96/12 = 8  The above observations can be organised into a frequency table and mean calculated on the basis of frequencies x 3 5 7 8 9 10 12 f 1 1 2 3 2 2 1 = 96; = 12 Σf 3 5 14 24 18 20 12 Thus, = 96/12 = 8
  • 26. Central Tendency–“Mean of Grouped Data”  House rental or prices in the PMR are frequently tabulated as a range of values. E.g. Rental (RM/month) 135-140 140-145 145-150 150-155 155-160 Mid-point value (x) 137.5 142.5 147.5 152.5 157.5 Number of Taman (f) 5 9 6 2 1 fx 687.5 1282.5 885.0 305.0 157.5  What is the mean rental across the areas? = 23; = 3317.5 Thus, = 3317.5/23 = 144.24
  • 27. Central Tendency – “Median”  Let say house rentals in a particular town are tabulated as follows: Rental (RM/month) 130-135 135-140 140-145 155-50 150-155 Number of Taman (f) 3 5 9 6 2 Rental (RM/month) >135 > 140 > 145 > 150 > 155 Cumulative frequency 3 8 1 23 25  Calculation of “median” rental needs a graphical aids→ 1. Median = (n+1)/2 = (25+1)/2 =13th. 5. Taman 13th. is 5th. out of the 9 Taman Taman 2. (i.e. between 10 – 15 points on the vertical axis of ogive). 6. The interval width is 5 3. Corresponds to RM 140- 7. Therefore, the median rental can 145/month on the horizontal axis be calculated as: 4. There are (17-8) = 9 Taman in the 140 + (5/9 x 5) = RM 142.8 range of RM 140-145/month
  • 28. Central Tendency – “Median” (contd.)
  • 29. Central Tendency – “Quartiles” (contd.) Upper quartile = ¾(n+1) = 19.5th. Taman UQ = 145 + (3/7 x 5) = RM 147.1/month Lower quartile = (n+1)/4 = 26/4 = 6.5 th. Taman LQ = 135 + (3.5/5 x 5) = RM138.5/month Inter-quartile = UQ – LQ = 147.1 – 138.5 = 8.6th. Taman IQ = 138.5 + (4/5 x 5) = RM 142.5/month
  • 30. “Variability”  Indicates dispersion, spread, variation, deviation  For single population or sample data: where σ2 and s2 = population and sample variance respectively, xi = individual observations, μ = population mean, = sample mean, and n = total number of individual observations.  The square roots are: standard deviation standard deviation
  • 31. “Variability” (contd.)  Why “measure of dispersion” important?  Consider returns from two categories of shares: * Shares A (%) = {1.8, 1.9, 2.0, 2.1, 3.6} * Shares B (%) = {1.0, 1.5, 2.0, 3.0, 3.9} Mean A = mean B = 2.28% But, different variability! Var(A) = 0.557, Var(B) = 1.367 * Would you invest in category A shares or category B shares?
  • 32. “Variability” (contd.)  Coefficient of variation – COV – std. deviation as % of the mean:  Could be a better measure compared to std. dev. COV(A) = 32.73%, COV(B) = 51.28%
  • 33. “Variability” (contd.)  Std. dev. of a frequency distribution The following table shows the age distribution of second-time home buyers: x^
  • 34. “Probability Distribution”  Defined as of probability density function (pdf).  Many types: Z, t, F, gamma, etc.  “God-given” nature of the real world event.  General form: (continuous) (discrete)  E.g.
  • 35. “Probability Distribution” (contd.) Dice2 Dice1 1 2 3 4 5 6 1 2 3 4 5 6 7 2 3 4 5 6 7 8 3 4 5 6 7 8 9 4 5 6 7 8 9 10 5 6 7 8 9 10 11 6 7 8 9 10 11 12
  • 36. “Probability Distribution” (contd.) Discrete values Discrete values Values of x are discrete (discontinuous) Sum of lengths of vertical bars Σp(X=x) = 1 all x
  • 37. “Probability Distribution” (contd.) 8 ▪ Many real world phenomena take a form of continuous random variable 6 ▪ Can take any values between two limits (e.g. income, age, weight, price, rental, etc.) 4 F n u q y c e r 2 Mean = 4.0628 Std. Dev. = 1.70319 N = 32 0 2.00 3.00 4.00 5.00 6.00 7.00 Rental (RM/sq.ft.)
  • 38. “Probability Distribution” (contd.) P(Rental = RM 8) = 0 P(Rental < RM 3.00) = 0.206 P(Rental < RM7) = 0.972 P(Rental ≥ RM 4.00) = 0.544
  • 39. “Probability Distribution” (contd.)  Ideal distribution of such phenomena: * Bell-shaped, symmetrical μ = mean of variable x σ = std. dev. Of x * Has a function of π = ratio of circumference of a circle to its diameter = 3.14 e = base of natural log = 2.71828
  • 40. “Probability distribution” μ ± 1σ = ? = ____% from total observation μ ± 2σ = ? = ____% from total observation μ ± 3σ = ? = ____% from total observation
  • 41. “Probability distribution” * Has the following distribution of observation
  • 42. “Probability distribution”  There are various other types and/or shapes of distribution. E.g. Note: Σp(AGE=age) ≠ 1 How to turn this graph into a probability distribution function (p.d.f.)?  Not “ideally” shaped like the previous one
  • 43. “Z-Distribution”  φ(X=x) is given by area under curve  Has no standard algebraic method of integration → Z ~ N(0,1)  It is called “normal distribution” (ND)  Standard reference/approximation of other distributions. Since there are various f(x) forming NDs, SND is needed  To transform f(x) into f(z): x-µ Z = --------- ~ N(0, 1) σ 160 –155 E.g. Z = ------------- = 0.926 5.4  Probability is such a way that: * Approx. 68% -1< z <1 * Approx. 95% -1.96 < z < 1.96 * Approx. 99% -2.58 < z < 2.58
  • 44. “Z-distribution” (contd.)  When X= μ, Z = 0, i.e.  When X = μ + σ, Z = 1  When X = μ + 2σ, Z = 2  When X = μ + 3σ, Z = 3 and so on.  It can be proven that P(X1 <X< Xk) = P(Z1 <Z< Zk)  SND shows the probability to the right of any particular value of Z.  Example
  • 45. Normal distribution…Questions Your sample found that the mean price of “affordable” homes in Johor Bahru, Y, is RM 155,000 with a variance of RM 3.8x107. On the basis of a normality assumption, how sure are you that: (a) The mean price is really ≤ RM 160,000 (b) The mean price is between RM 145,000 and 160,000 Answer (a): 160,000 -155,000 P(Y ≤ 160,000) = P(Z ≤ ---------------------------) = P(Z ≤ 0.811) √3.8x10 7 = 0.1867 Using Z-table , the required probability is: 1-0.1867 = 0.8133 Always remember: to convert to SND, subtract the mean and divide by the std. dev.
  • 46. Normal distribution…Questions Answer (b): X1 - μ 145,000 – 155,000 Z1 = ------ = ---------------- = -1.622 σ √3.8x107 X2 - μ 160,000 – 155,000 Z2 = ------ = ---------------- = 0.811 σ √3.8x10 7 P(Z1<-1.622)=0.0455; P(Z2>0.811)=0.1867 ∴P(145,000<Z<160,000) = P(1-(0.0455+0.1867) = 0.7678
  • 47. Normal distribution…Questions You are told by a property consultant that the average rental for a shop house in Johor Bahru is RM 3.20 per sq. After searching, you discovered the following rental data: 2.20, 3.00, 2.00, 2.50, 3.50,3.20, 2.60, 2.00, 3.10, 2.70 What is the probability that the rental is greater than RM 3.00?
  • 48. “Student’s t-Distribution”  Similar to Z-distribution: * t(0,σ) but σn→∞→1 * -∞ < t < +∞ * Flatter with thicker tails * As n→∞ t(0,σ) → N(0,1) * Has a function of where Γ=gamma distribution; v=n-1=d.o.f; π=3.147 * Probability calculation requires information on d.o.f.
  • 49. “Student’s t-Distribution”  Given n independent measurements, xi, let where μ is the population mean, is the sample mean, and s is the estimator for population standard deviation.  Distributionof the random variable t which is (very loosely) the "best" that we can do not knowing σ.
  • 50. “Student’s t-Distribution”  Student's t-distribution can be derived by: * transforming Student's z-distribution using * defining  The resulting probability and cumulative distribution functions are:
  • 51. “Student’s t-Distribution”  fr(t) = = Fr(t) = = = where r ≡ n-1 is the number of degrees of freedom, -∞<t<∞,Γ(t) is the gamma function, B(a,b) is the beta function, and I(z;a,b) is the regularized beta function defined by 
  • 52. Forms of “statistical” relationship  Correlation  Contingency  Cause-and-effect * Causal * Feedback * Multi-directional * Recursive  The last two categories are normally dealt with through regression
  • 53. Correlation  “Co-exist”.E.g. * left shoe & right shoe, sleep & lying down, food & drink  Indicate “some” co-existence relationship. E.g. * Linearly associated (-ve or +ve) Formula: * Co-dependent, independent  But, nothing to do with C-A-E r/ship! Example: After a field survey, you have the following data on the distance to work and distance to the city of residents in J.B. area. Interpret the results?
  • 54. Contingency  A form of “conditional” co-existence: * If X, then, NOT Y; if Y, then, NOT X * If X, then, ALSO Y * E.g. + if they choose to live close to workplace, then, they will stay away from city + if they choose to live close to city, then, they will stay away from workplace + they will stay close to both workplace and city
  • 55. Correlation and regression – matrix approach
  • 56. Correlation and regression – matrix approach
  • 57. Correlation and regression – matrix approach
  • 58. Correlation and regression – matrix approach
  • 59. Correlation and regression – matrix approach
  • 60. Test yourselves! Q1: Calculate the min and std. variance of the following data: PRICE - RM ‘000 130 137 128 390 140 241 342 143 SQ. M OF FLOOR 135 140 100 360 175 270 200 170 Q2: Calculate the mean price of the following low-cost houses, in various localities across the country: PRICE - RM ‘000 (x) 36 37 38 39 40 41 42 43 NO. OF LOCALITIES (f) 3 14 10 36 73 27 20 17
  • 61. Test yourselves! Q3: From a sample information, a population of housing estate is believed have a “normal” distribution of X ~ (155, 45). What is the general adjustment to obtain a Standard Normal Distribution of this population? Q4: Consider the following ROI for two types of investment: A: 3.6, 4.6, 4.6, 5.2, 4.2, 6.5 B: 3.3, 3.4, 4.2, 5.5, 5.8, 6.8 Decide which investment you would choose.
  • 62. Test yourselves! Q5: Find: φ(AGE > “30-34”) φ(AGE ≤ 20-24) φ( “35-39”≤ AGE < “50-54”)
  • 63. Test yourselves! Q6: You are asked by a property marketing manager to ascertain whether or not distance to work and distance to the city are “equally” important factors influencing people’s choice of house location. You are given the following data for the purpose of testing: Explore the data as follows: • Create histograms for both distances. Comment on the shape of the histograms. What is you conclusion? • Construct scatter diagram of both distances. Comment on the output. • Explore the data and give some analysis. • Set a hypothesis that means of both distances are the same. Make your conclusion.
  • 64. Test yourselves! (contd.) Q7: From your initial investigation, you belief that tenants of “low-quality” housing choose to rent particular flat units just to find shelters. In this context ,these groups of people do not pay much attention to pertinent aspects of “quality life” such as accessibility, good surrounding, security, and physical facilities in the living areas. (a) Set your research design and data analysis procedure to address the research issue (b) Test your hypothesis that low-income tenants do not perceive “quality life” to be important in paying their house rentals.