SlideShare a Scribd company logo
Regression-through-the-origin: Ratio of Means or Medians
                                       (A Learning Vignette)




                                    Justine Leon A. Uro
                               e-mail: justineuro@yahoo.com


                                             June 2009
                                       (1st revision 24 July 2010)1

                                      (2nd revision 24 July 2010)2

                                       (3rd revision 29 July 2010)3




                                                                        C
                                                             BY:    $
                                                                   




                 ‘‘Regression-through-the-origin: Ratio of Means or Medians
                 (A Learning Vignette)’’ by Justine Leon Uro is licensed under
                 a Creative Commons Attribution-Noncommercial-Share Alike 3.0
                 Philippines License.

                 To view a copy of this license, visit http://creativecommons.
                 org/licenses/by-nc-sa/3.0/ph/ or send a letter to Creative
                 Commons, 171 Second Street, Suite 300, San Francisco,
                 California, 94105, USA.


  1
    The primary revision appears on p. 2 of the paper in which the estimator b is described in a more precise
manner. (J.L.A.U.; 24 July 2010)
  2
    Minor corrections done on the divisors of the weighted means (p. 2). (J.L.A.U.; 24 July 2010)
  3
    Attached a Creative Commons License. (J.L.A.U.; 29 July 2010)


                                                     1
Rationale and Objectives for the Vignette “Ratio of Means or Medians”

Rationale: 1) Firstly, the vignette may be used as a vehicle to inform the students regarding
the circumstances related to the discovery of the principle of least squares and the charac-
teristic of some scientists, for good reasons, to set aside newly discovered results for later
publication. 2) Secondly, the vignette may also be used for emphasizing the concept of
least-squares using a very meager dataset (using three data points) and hence emphasizing
the principle of parsimony and simplicity in data analysis. In so doing, the students are
aided in appreciating the concept to be learned by visualizing the concept instead of just
memorizing formulas and the principle in words. The paucity of the dataset should not be
a source of discouragement considering that, as they would later discover in the vignette,
Gauss predicted the position of the asteroid Ceres based only on three data points and
using the principle of least squares! 3) Together with the use of the alternative methods of
averages to come up with alternative estimators, the students experience the value of cre-
ativity, ingenuity, and common sense in discovery. The use of the ratio of medians instead
of the means, as mentioned in the vignette, may also be used to inform the students of
the principle of using robust methods for first-analysis of datasets before having recourse
to more complicated methods.


Objectives: After studying the vignette the students should be able to: 1) give an account
of the discovery of least squares; 2) explain the principle of least squares by using a simple
diagram; 3) appreciate the value of simplicity, parsimony, common sense, ingenuity, creativ-
ity, communication, and accuracy in the process of discovery; 4) identify linear regression
problems wherein the ratio of medians may be more appropriate than the ratio of means;
and, 5) appreciate the value of statistics as a scientific methodology.


Scientific attitudes addressed (based on Roach (1993)) 4 : Skepticism, Communication, Ac-
curacy, Parsimony, Common Sense


Teacher Notes: This vignette should be discussed after the lecture on linear regression.



  4
    Roach, Linda E. (1993). I Have A Story About That: Historical Vignettes to Enhance the Teaching of
the Nature of Science, p. 13.


                                                  2
Regression-through-the-origin: Ratio of Means or Medians
                                                      by
                                            Justine Leon A. Uro
                                   e-mail:     justineuro@yahoo.com

    An experimenter wants to determine an equation of a line that can be used to describe
a possible linear relationship between two hypothetical variables, say X and Y . It is known
beforehand that Y = 0 when X = 0. He was also able to obtain three additional pairs
of values for X and Y : (2,3), (3,2), and (4,7). There are a number of ways for obtaining
such a “best-fit-line.” A common (maybe the most common) method is through the use of
the least-squares (LS) criterion, in which case, the estimated line is called the least-squares
(LS) line.



       What do you recall about the least-squares criterion? In what sense
       is an LS line a “least-squares” line?



    For this particular type of dataset, since the LS line has to pass through the origin (0,0)
(regression-through-the-origin method) the LS line is given by y = bx where b = yw /¯w
                                                                                ¯ x
and xw and yw are the weighted arithmetic means (with weighting factor w = X) of the
    ¯      ¯
variables X and Y , respectively.       5   That is to say, b = (         wi Yi /    wi )/(   wi Xi /   wi ) =
(   Xi Yi /    Xi )/(    Xi Xi /     Xi ), or simply b =        Xi Yi /      Xi2 .



       Construct a scatterplot of Y vs. X. Find the weighted means xw , yw ,
                                                                   ¯    ¯
       and an equation of the LS line. Overlay the graph of the LS line on the
       scatterplot of Y vs. X. Aside from the LS line, what other “best-fit-
       lines” have you previously encountered? How does one obtain them?
       You may want to graph one of these alternative “best-fit-lines” on
       the scatterplot of Y vs. X.


    5
      This is a more precise formula of the LS estimator of b than that appearing in an earlier version of this
paper in which this author inadvertently used b = y /¯, where x and y were meant to denote the simple or
                                                      ¯ x          ¯      ¯
unweighted arithmetic means of X and Y , respectively. It should, however, be pointed out that using b = y /¯
                                                                                                           ¯ x
is of particular interest (for regression-thru-the-origin) since it is the zero deviation or ZD estimator—the
sum of the deviations of the predicted from the observed Y values becomes zero for this estimator of b.


                                                      3
The earliest publicized use of the LS method is probably that by Karl Friedrich Gauss
who in 1801, at 23 or 24 years old, used this method to predict the position of the asteroid
Ceres as it emerged from the sun based on only three celestial observations of this asteroid
[[1], [2], [3]]. Based on the calculations of Gauss, Hungarian astronomer Franz Xaver von
Zach and German astronomer Heinrich Olbers rediscovered it on December 31, 1801 in
Gotha and January 1, 1802 in Bremen, respectively [[1]]. The asteroid was initially discov-
ered by Italian Giuseppe Piazzi in January 1, 1801 but was able to watch it in its path for
only 40 days before the glare of the sun got in the way [[2]]. Apparently Gauss had been
using the LS method since 1795 but did not publish it until 1809. By then, Frenchman
Adrien-Marie Legendre and Irish-American Robert Adrain had discovered it independently
of each other and of Gauss. Legendre published his results in 1805 and Adrain in 1808.



     Do you know of other scientists the publication of whose discoveries
     were preceded by colleagues who also discovered them, albeit later?
     Please cite examples. Why do you think these discoverers delayed
     the publication of their discoveries?



   Recall that there are three common measures of average: the mean, median, and mode.
Although the most common is the mean, there are instances wherein the median or the
mode is preferred. For example, the median is preferred over the mean when the dataset is
skewed to the right in that it includes some extremely large values.



     Recall that the LS estimator for the slope for the regression-through-
     the-origin is b = yw /¯w and is therefore a ratio of two averages, these
                       ¯ x
     averages being weighted means. Suggest other estimators for b based
     on your knowledge of averages (see the contents of the previous para-
     graph). Can you think of instances of datasets wherein it better to
     use this kind of average instead of the weighted mean? Cite examples
     and justify your answer.



   In connection with the determination of the equation of a best-fit-line to describe the

                                             4
relationship between two variables, H. Theil (1950; in Daniel (1991, pp. 621-2, 630)) and
E.J. Dietz (1989; in Daniel (1991, pp. 622, 630)) used the median instead of the mean.
A possible estimator for the slope of the regression-through-the-origin line based on their
theory would be b = median (y/x). A similar estimator would be b = median(y)/median(x).



       Obtain the regression lines based on these two estimators of b then
       graph them on the scatterplot obtained earlier. Compare the three
       regression lines obtained to each other based on their graphs.



   Gauss in 1829 was able to prove that the least squares method for obtaining a best fit
line is optimal in that among unbiased estimators for the coefficients of the regression line,
the LS method gives those with least variance assuming that the errors are independently
and identically normally distributed (Gaussian distribution) with zero mean and a common
variance. On the other hand, the two methods derived from the work of Theil (1950; in
Daniel (1991, pp. 621-2, 630)) and Dietz (1989; in Daniel (1991, pp. 622, 630)) mentioned
previously are robust in that they are nonparametric (no particular distribution assumed
for the errors).



       It can thus be seen that the three methods mentioned above have
       their advantages and disadvantages. What can you say about the
       relative number of calculations involved in the three different meth-
       ods? Cite instances wherein a nonparametric method is better than
       a parametric method especially when dealing with datasets that arise
       in the physical sciences.



References

[1] Carl Friedrich Gauss. (2007, April 28, 3:27). In Wikipedia, The Free Encyclopedia. Re-
   trieved April 29, 2007 from http://en.wikipedia.org/wiki/Carl_Friedrich_Gauss.
   4



                                             5
[2] Least squares. (2007, June 30, 7:59). In Wikipedia, The Free Encyclopedia. Retrieved
   July 9, 2007 from http://en.wikipedia.org/wiki/Least_squares. 4

[3] Statistics. (2007, July 6, 8:05). In Wikipedia, The Free Encyclopedia. Retrieved July 7,
   2007 from http://en.wikipedia.org/wiki/Statistics. 4

[4] Probability. (2007, June 20, 2:37). In Wikipedia, The Free Encyclopedia. Retrieved July
   7, 2007 from http://en.wikipedia.org/wiki/Probability.

[5] Daniel, W. (1991). Biostatistics: a foundation for analysis in the health sciences, 5th
   ed., pp. 621-2, 630.




                                             6

More Related Content

Similar to Regression-through-the-origin: Ratio of Means or Medians

Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
The Statistical and Applied Mathematical Sciences Institute
 
Ldb Convergenze Parallele_De barros_02
Ldb Convergenze Parallele_De barros_02Ldb Convergenze Parallele_De barros_02
Ldb Convergenze Parallele_De barros_02laboratoridalbasso
 
A. spanos slides ch14-2013 (4)
A. spanos slides ch14-2013 (4)A. spanos slides ch14-2013 (4)
A. spanos slides ch14-2013 (4)
jemille6
 
bau-22-1.pdf
bau-22-1.pdfbau-22-1.pdf
bau-22-1.pdf
ShengBau
 
Math Reviews of my papers
Math Reviews of my papersMath Reviews of my papers
Math Reviews of my papersbriansthomson
 
FINAL 2014 Summer QuarkNet Research – LHCb Paper
FINAL 2014 Summer QuarkNet Research – LHCb PaperFINAL 2014 Summer QuarkNet Research – LHCb Paper
FINAL 2014 Summer QuarkNet Research – LHCb PaperTheodore Baker
 
Adensonian classification
Adensonian classificationAdensonian classification
Adensonian classification
Devyashree Medhi
 
matrices-1.pdf
matrices-1.pdfmatrices-1.pdf
matrices-1.pdf
WunnamAlabani
 
07 Tensor Visualization
07 Tensor Visualization07 Tensor Visualization
07 Tensor Visualization
Valerii Klymchuk
 
Eigenaxes
EigenaxesEigenaxes
As pi re2015_abstracts
As pi re2015_abstractsAs pi re2015_abstracts
As pi re2015_abstracts
Joseph Park
 
computers in education mathematics
computers in education mathematicscomputers in education mathematics
computers in education mathematicsStephanie Sirna
 
Normal curve in Biostatistics data inference and applications
Normal curve in Biostatistics data inference and applicationsNormal curve in Biostatistics data inference and applications
Normal curve in Biostatistics data inference and applications
Bala Vidyadhar
 
Quantum chaos in clean many-body systems - Tomaž Prosen
Quantum chaos in clean many-body systems - Tomaž ProsenQuantum chaos in clean many-body systems - Tomaž Prosen
Quantum chaos in clean many-body systems - Tomaž Prosen
Lake Como School of Advanced Studies
 
Statistical Measures of Location: Mathematical Formulas versus Geometric Appr...
Statistical Measures of Location: Mathematical Formulas versus Geometric Appr...Statistical Measures of Location: Mathematical Formulas versus Geometric Appr...
Statistical Measures of Location: Mathematical Formulas versus Geometric Appr...
BRNSS Publication Hub
 
A Rough Set View On Bayes Theorem
A Rough Set View On Bayes  TheoremA Rough Set View On Bayes  Theorem
A Rough Set View On Bayes Theorem
Felicia Clark
 
Bigdatanauiduihaunjcinacssdzhniuasdb ahcbsibcas
Bigdatanauiduihaunjcinacssdzhniuasdb ahcbsibcasBigdatanauiduihaunjcinacssdzhniuasdb ahcbsibcas
Bigdatanauiduihaunjcinacssdzhniuasdb ahcbsibcas
king896096
 
ISM_Session_5 _ 23rd and 24th December.pptx
ISM_Session_5 _ 23rd and 24th December.pptxISM_Session_5 _ 23rd and 24th December.pptx
ISM_Session_5 _ 23rd and 24th December.pptx
ssuser1eba67
 

Similar to Regression-through-the-origin: Ratio of Means or Medians (20)

AI Lesson 26
AI Lesson 26AI Lesson 26
AI Lesson 26
 
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
 
Ldb Convergenze Parallele_De barros_02
Ldb Convergenze Parallele_De barros_02Ldb Convergenze Parallele_De barros_02
Ldb Convergenze Parallele_De barros_02
 
A. spanos slides ch14-2013 (4)
A. spanos slides ch14-2013 (4)A. spanos slides ch14-2013 (4)
A. spanos slides ch14-2013 (4)
 
bau-22-1.pdf
bau-22-1.pdfbau-22-1.pdf
bau-22-1.pdf
 
Math Reviews of my papers
Math Reviews of my papersMath Reviews of my papers
Math Reviews of my papers
 
FINAL 2014 Summer QuarkNet Research – LHCb Paper
FINAL 2014 Summer QuarkNet Research – LHCb PaperFINAL 2014 Summer QuarkNet Research – LHCb Paper
FINAL 2014 Summer QuarkNet Research – LHCb Paper
 
Adensonian classification
Adensonian classificationAdensonian classification
Adensonian classification
 
matrices-1.pdf
matrices-1.pdfmatrices-1.pdf
matrices-1.pdf
 
07 Tensor Visualization
07 Tensor Visualization07 Tensor Visualization
07 Tensor Visualization
 
F0742328
F0742328F0742328
F0742328
 
Eigenaxes
EigenaxesEigenaxes
Eigenaxes
 
As pi re2015_abstracts
As pi re2015_abstractsAs pi re2015_abstracts
As pi re2015_abstracts
 
computers in education mathematics
computers in education mathematicscomputers in education mathematics
computers in education mathematics
 
Normal curve in Biostatistics data inference and applications
Normal curve in Biostatistics data inference and applicationsNormal curve in Biostatistics data inference and applications
Normal curve in Biostatistics data inference and applications
 
Quantum chaos in clean many-body systems - Tomaž Prosen
Quantum chaos in clean many-body systems - Tomaž ProsenQuantum chaos in clean many-body systems - Tomaž Prosen
Quantum chaos in clean many-body systems - Tomaž Prosen
 
Statistical Measures of Location: Mathematical Formulas versus Geometric Appr...
Statistical Measures of Location: Mathematical Formulas versus Geometric Appr...Statistical Measures of Location: Mathematical Formulas versus Geometric Appr...
Statistical Measures of Location: Mathematical Formulas versus Geometric Appr...
 
A Rough Set View On Bayes Theorem
A Rough Set View On Bayes  TheoremA Rough Set View On Bayes  Theorem
A Rough Set View On Bayes Theorem
 
Bigdatanauiduihaunjcinacssdzhniuasdb ahcbsibcas
Bigdatanauiduihaunjcinacssdzhniuasdb ahcbsibcasBigdatanauiduihaunjcinacssdzhniuasdb ahcbsibcas
Bigdatanauiduihaunjcinacssdzhniuasdb ahcbsibcas
 
ISM_Session_5 _ 23rd and 24th December.pptx
ISM_Session_5 _ 23rd and 24th December.pptxISM_Session_5 _ 23rd and 24th December.pptx
ISM_Session_5 _ 23rd and 24th December.pptx
 

More from Justine Leon Uro

"What-teach-kids" in the Paleolithic Age
"What-teach-kids" in the Paleolithic Age"What-teach-kids" in the Paleolithic Age
"What-teach-kids" in the Paleolithic Age
Justine Leon Uro
 
On Some Measures of Genetic Distance Based on Rates of Nucleotide Substitution
On Some Measures of Genetic Distance Based on Rates of Nucleotide SubstitutionOn Some Measures of Genetic Distance Based on Rates of Nucleotide Substitution
On Some Measures of Genetic Distance Based on Rates of Nucleotide Substitution
Justine Leon Uro
 
An Algorithm for Evaluating the Validity of Singly-Quantified Monadic Predica...
An Algorithm for Evaluating the Validity of Singly-Quantified Monadic Predica...An Algorithm for Evaluating the Validity of Singly-Quantified Monadic Predica...
An Algorithm for Evaluating the Validity of Singly-Quantified Monadic Predica...
Justine Leon Uro
 
algorithm, validity, predicate logic (pdf format)
algorithm, validity, predicate logic (pdf format)algorithm, validity, predicate logic (pdf format)
algorithm, validity, predicate logic (pdf format)
Justine Leon Uro
 
Examples to accompany "algorithm, validity, predicate logic"
Examples to accompany "algorithm, validity, predicate logic"Examples to accompany "algorithm, validity, predicate logic"
Examples to accompany "algorithm, validity, predicate logic"
Justine Leon Uro
 
algorithm, validity, predicate logic
algorithm, validity, predicate logicalgorithm, validity, predicate logic
algorithm, validity, predicate logic
Justine Leon Uro
 

More from Justine Leon Uro (6)

"What-teach-kids" in the Paleolithic Age
"What-teach-kids" in the Paleolithic Age"What-teach-kids" in the Paleolithic Age
"What-teach-kids" in the Paleolithic Age
 
On Some Measures of Genetic Distance Based on Rates of Nucleotide Substitution
On Some Measures of Genetic Distance Based on Rates of Nucleotide SubstitutionOn Some Measures of Genetic Distance Based on Rates of Nucleotide Substitution
On Some Measures of Genetic Distance Based on Rates of Nucleotide Substitution
 
An Algorithm for Evaluating the Validity of Singly-Quantified Monadic Predica...
An Algorithm for Evaluating the Validity of Singly-Quantified Monadic Predica...An Algorithm for Evaluating the Validity of Singly-Quantified Monadic Predica...
An Algorithm for Evaluating the Validity of Singly-Quantified Monadic Predica...
 
algorithm, validity, predicate logic (pdf format)
algorithm, validity, predicate logic (pdf format)algorithm, validity, predicate logic (pdf format)
algorithm, validity, predicate logic (pdf format)
 
Examples to accompany "algorithm, validity, predicate logic"
Examples to accompany "algorithm, validity, predicate logic"Examples to accompany "algorithm, validity, predicate logic"
Examples to accompany "algorithm, validity, predicate logic"
 
algorithm, validity, predicate logic
algorithm, validity, predicate logicalgorithm, validity, predicate logic
algorithm, validity, predicate logic
 

Recently uploaded

Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
vaibhavrinwa19
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Dr. Vinod Kumar Kanvaria
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdfMASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
goswamiyash170123
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
Jean Carlos Nunes Paixão
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
David Douglas School District
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
kimdan468
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
Israel Genealogy Research Association
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
EduSkills OECD
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
Multithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race conditionMultithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race condition
Mohammed Sikander
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
DhatriParmar
 
JEE1_This_section_contains_FOUR_ questions
JEE1_This_section_contains_FOUR_ questionsJEE1_This_section_contains_FOUR_ questions
JEE1_This_section_contains_FOUR_ questions
ShivajiThube2
 

Recently uploaded (20)

Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdfMASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
Multithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race conditionMultithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race condition
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
 
JEE1_This_section_contains_FOUR_ questions
JEE1_This_section_contains_FOUR_ questionsJEE1_This_section_contains_FOUR_ questions
JEE1_This_section_contains_FOUR_ questions
 

Regression-through-the-origin: Ratio of Means or Medians

  • 1. Regression-through-the-origin: Ratio of Means or Medians (A Learning Vignette) Justine Leon A. Uro e-mail: justineuro@yahoo.com June 2009 (1st revision 24 July 2010)1 (2nd revision 24 July 2010)2 (3rd revision 29 July 2010)3 C BY: $ ‘‘Regression-through-the-origin: Ratio of Means or Medians (A Learning Vignette)’’ by Justine Leon Uro is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Philippines License. To view a copy of this license, visit http://creativecommons. org/licenses/by-nc-sa/3.0/ph/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA. 1 The primary revision appears on p. 2 of the paper in which the estimator b is described in a more precise manner. (J.L.A.U.; 24 July 2010) 2 Minor corrections done on the divisors of the weighted means (p. 2). (J.L.A.U.; 24 July 2010) 3 Attached a Creative Commons License. (J.L.A.U.; 29 July 2010) 1
  • 2. Rationale and Objectives for the Vignette “Ratio of Means or Medians” Rationale: 1) Firstly, the vignette may be used as a vehicle to inform the students regarding the circumstances related to the discovery of the principle of least squares and the charac- teristic of some scientists, for good reasons, to set aside newly discovered results for later publication. 2) Secondly, the vignette may also be used for emphasizing the concept of least-squares using a very meager dataset (using three data points) and hence emphasizing the principle of parsimony and simplicity in data analysis. In so doing, the students are aided in appreciating the concept to be learned by visualizing the concept instead of just memorizing formulas and the principle in words. The paucity of the dataset should not be a source of discouragement considering that, as they would later discover in the vignette, Gauss predicted the position of the asteroid Ceres based only on three data points and using the principle of least squares! 3) Together with the use of the alternative methods of averages to come up with alternative estimators, the students experience the value of cre- ativity, ingenuity, and common sense in discovery. The use of the ratio of medians instead of the means, as mentioned in the vignette, may also be used to inform the students of the principle of using robust methods for first-analysis of datasets before having recourse to more complicated methods. Objectives: After studying the vignette the students should be able to: 1) give an account of the discovery of least squares; 2) explain the principle of least squares by using a simple diagram; 3) appreciate the value of simplicity, parsimony, common sense, ingenuity, creativ- ity, communication, and accuracy in the process of discovery; 4) identify linear regression problems wherein the ratio of medians may be more appropriate than the ratio of means; and, 5) appreciate the value of statistics as a scientific methodology. Scientific attitudes addressed (based on Roach (1993)) 4 : Skepticism, Communication, Ac- curacy, Parsimony, Common Sense Teacher Notes: This vignette should be discussed after the lecture on linear regression. 4 Roach, Linda E. (1993). I Have A Story About That: Historical Vignettes to Enhance the Teaching of the Nature of Science, p. 13. 2
  • 3. Regression-through-the-origin: Ratio of Means or Medians by Justine Leon A. Uro e-mail: justineuro@yahoo.com An experimenter wants to determine an equation of a line that can be used to describe a possible linear relationship between two hypothetical variables, say X and Y . It is known beforehand that Y = 0 when X = 0. He was also able to obtain three additional pairs of values for X and Y : (2,3), (3,2), and (4,7). There are a number of ways for obtaining such a “best-fit-line.” A common (maybe the most common) method is through the use of the least-squares (LS) criterion, in which case, the estimated line is called the least-squares (LS) line. What do you recall about the least-squares criterion? In what sense is an LS line a “least-squares” line? For this particular type of dataset, since the LS line has to pass through the origin (0,0) (regression-through-the-origin method) the LS line is given by y = bx where b = yw /¯w ¯ x and xw and yw are the weighted arithmetic means (with weighting factor w = X) of the ¯ ¯ variables X and Y , respectively. 5 That is to say, b = ( wi Yi / wi )/( wi Xi / wi ) = ( Xi Yi / Xi )/( Xi Xi / Xi ), or simply b = Xi Yi / Xi2 . Construct a scatterplot of Y vs. X. Find the weighted means xw , yw , ¯ ¯ and an equation of the LS line. Overlay the graph of the LS line on the scatterplot of Y vs. X. Aside from the LS line, what other “best-fit- lines” have you previously encountered? How does one obtain them? You may want to graph one of these alternative “best-fit-lines” on the scatterplot of Y vs. X. 5 This is a more precise formula of the LS estimator of b than that appearing in an earlier version of this paper in which this author inadvertently used b = y /¯, where x and y were meant to denote the simple or ¯ x ¯ ¯ unweighted arithmetic means of X and Y , respectively. It should, however, be pointed out that using b = y /¯ ¯ x is of particular interest (for regression-thru-the-origin) since it is the zero deviation or ZD estimator—the sum of the deviations of the predicted from the observed Y values becomes zero for this estimator of b. 3
  • 4. The earliest publicized use of the LS method is probably that by Karl Friedrich Gauss who in 1801, at 23 or 24 years old, used this method to predict the position of the asteroid Ceres as it emerged from the sun based on only three celestial observations of this asteroid [[1], [2], [3]]. Based on the calculations of Gauss, Hungarian astronomer Franz Xaver von Zach and German astronomer Heinrich Olbers rediscovered it on December 31, 1801 in Gotha and January 1, 1802 in Bremen, respectively [[1]]. The asteroid was initially discov- ered by Italian Giuseppe Piazzi in January 1, 1801 but was able to watch it in its path for only 40 days before the glare of the sun got in the way [[2]]. Apparently Gauss had been using the LS method since 1795 but did not publish it until 1809. By then, Frenchman Adrien-Marie Legendre and Irish-American Robert Adrain had discovered it independently of each other and of Gauss. Legendre published his results in 1805 and Adrain in 1808. Do you know of other scientists the publication of whose discoveries were preceded by colleagues who also discovered them, albeit later? Please cite examples. Why do you think these discoverers delayed the publication of their discoveries? Recall that there are three common measures of average: the mean, median, and mode. Although the most common is the mean, there are instances wherein the median or the mode is preferred. For example, the median is preferred over the mean when the dataset is skewed to the right in that it includes some extremely large values. Recall that the LS estimator for the slope for the regression-through- the-origin is b = yw /¯w and is therefore a ratio of two averages, these ¯ x averages being weighted means. Suggest other estimators for b based on your knowledge of averages (see the contents of the previous para- graph). Can you think of instances of datasets wherein it better to use this kind of average instead of the weighted mean? Cite examples and justify your answer. In connection with the determination of the equation of a best-fit-line to describe the 4
  • 5. relationship between two variables, H. Theil (1950; in Daniel (1991, pp. 621-2, 630)) and E.J. Dietz (1989; in Daniel (1991, pp. 622, 630)) used the median instead of the mean. A possible estimator for the slope of the regression-through-the-origin line based on their theory would be b = median (y/x). A similar estimator would be b = median(y)/median(x). Obtain the regression lines based on these two estimators of b then graph them on the scatterplot obtained earlier. Compare the three regression lines obtained to each other based on their graphs. Gauss in 1829 was able to prove that the least squares method for obtaining a best fit line is optimal in that among unbiased estimators for the coefficients of the regression line, the LS method gives those with least variance assuming that the errors are independently and identically normally distributed (Gaussian distribution) with zero mean and a common variance. On the other hand, the two methods derived from the work of Theil (1950; in Daniel (1991, pp. 621-2, 630)) and Dietz (1989; in Daniel (1991, pp. 622, 630)) mentioned previously are robust in that they are nonparametric (no particular distribution assumed for the errors). It can thus be seen that the three methods mentioned above have their advantages and disadvantages. What can you say about the relative number of calculations involved in the three different meth- ods? Cite instances wherein a nonparametric method is better than a parametric method especially when dealing with datasets that arise in the physical sciences. References [1] Carl Friedrich Gauss. (2007, April 28, 3:27). In Wikipedia, The Free Encyclopedia. Re- trieved April 29, 2007 from http://en.wikipedia.org/wiki/Carl_Friedrich_Gauss. 4 5
  • 6. [2] Least squares. (2007, June 30, 7:59). In Wikipedia, The Free Encyclopedia. Retrieved July 9, 2007 from http://en.wikipedia.org/wiki/Least_squares. 4 [3] Statistics. (2007, July 6, 8:05). In Wikipedia, The Free Encyclopedia. Retrieved July 7, 2007 from http://en.wikipedia.org/wiki/Statistics. 4 [4] Probability. (2007, June 20, 2:37). In Wikipedia, The Free Encyclopedia. Retrieved July 7, 2007 from http://en.wikipedia.org/wiki/Probability. [5] Daniel, W. (1991). Biostatistics: a foundation for analysis in the health sciences, 5th ed., pp. 621-2, 630. 6