Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

QT1 - 02 - Frequency Distribution


Published on

Class notes used in Quantitative Techniques - I course at Praxis Business School, Calcutta

Published in: Education, Technology, Business
  • Be the first to comment

QT1 - 02 - Frequency Distribution

  1. 1. Frequency Distributions Q U A N T T E C H I N T E U Q I A S E V I T 1 0 S
  2. 2. Contents <ul><li>Basics of Data </li></ul><ul><ul><li>Samples and Populations </li></ul></ul><ul><li>Data Array </li></ul><ul><li>Frequency Distributions </li></ul><ul><ul><li>Relative Frequency Distributions </li></ul></ul><ul><li>Classes </li></ul><ul><ul><li>Qualitative versus Quantitative </li></ul></ul><ul><ul><li>Discreet versus Continuous </li></ul></ul><ul><li>Illustrating Data </li></ul><ul><ul><li>Histograms </li></ul></ul><ul><ul><li>Polygons </li></ul></ul>
  3. 3. Data Basics <ul><li>Data are collections of any number or related observations </li></ul><ul><ul><li>Number of telephones installed by all workers in one day </li></ul></ul><ul><ul><li>Number of telephones installed by one worker in one day </li></ul></ul><ul><ul><li>Number of tourists in Finland on every Diwali day ?? </li></ul></ul><ul><li>Data is useful when they </li></ul><ul><ul><li>Reveal some kind of pattern </li></ul></ul><ul><ul><ul><li>Temperature in December is less than that in June </li></ul></ul></ul><ul><ul><li>Lead to some logical conclusion </li></ul></ul><ul><ul><ul><li>Senior citizens avoid investing in equity markets </li></ul></ul></ul>
  4. 4. Data Collection & Sanity Check <ul><li>Source of Data </li></ul><ul><ul><li>Actual observation in the field </li></ul></ul><ul><ul><li>Physical records available with source organisation </li></ul></ul><ul><ul><li>Third party data sources </li></ul></ul><ul><ul><ul><li>Commercial data sellers </li></ul></ul></ul><ul><ul><ul><li>Free data sources available in the web </li></ul></ul></ul><ul><li>Basic Sanity Check </li></ul><ul><ul><li>Is the source trustworthy ? </li></ul></ul><ul><ul><li>Is there something missing in the data ? </li></ul></ul><ul><ul><li>Do we have enough observations ? </li></ul></ul><ul><ul><li>Is the conclusion logical ? Garbage In Garbage Out </li></ul></ul><ul><ul><li>Is there double counting ? </li></ul></ul>
  5. 5. Samples and Populations <ul><li>Population : is a collection of all elements about whom we are trying to draw conclusions </li></ul><ul><ul><li>Women in Calcutta with age > 18 </li></ul></ul><ul><li>Sample : is a collection of some, not all, elements of the population about whom we are in a position to gather data </li></ul><ul><ul><li>Statisticians gather data from a sample and then use this data to draw inferences about the population </li></ul></ul><ul><li>Representative Sample : it should reflect the characteristics of the underlying population </li></ul><ul><ul><li>Selecting a sample from of women from Calcutta Club may not be representative of all women in Calcutta ! </li></ul></ul>
  6. 6. Organising Data <ul><li>Organising data enables us to quickly spot some of the characteristics of the data </li></ul><ul><ul><li>Range : Highest Value ? Lowest Value ? </li></ul></ul><ul><ul><li>Clustering : Are the values grouped around a specific value ? </li></ul></ul><ul><ul><li>Popularity : Which value occurs most frequently </li></ul></ul><ul><li>Ways of organising data </li></ul><ul><ul><li>Simple ascending or descending order </li></ul></ul><ul><ul><li>Group by certain characteristic </li></ul></ul><ul><ul><ul><li>Age ? Income ? Education Level ? </li></ul></ul></ul><ul><ul><ul><li>Colour ? Material ? </li></ul></ul></ul>
  7. 7. Examples of Raw Data Retail Sales Figures
  8. 8. Examples of Raw Data Forbes500 Company Data
  9. 9. Examples of Raw Data US Cereals Data
  10. 10. Examples of Raw Data Stockmarket Price Data
  11. 11. Examples of Raw Data Examination Marks
  12. 12. Examples of Raw Data Engine Pollution Data
  13. 13. Data Array <ul><li>The Data Array arranges values in ascending or descending order </li></ul>
  14. 14. Why Create a Data Array ? <ul><li>We can quickly get the highest and lowest value </li></ul><ul><ul><li>In Hydrocarbon : 0.34 .. 1.1 </li></ul></ul><ul><li>We can divide the data into sections </li></ul><ul><ul><li>First 1/3 : Between 0.34 and 0.46 </li></ul></ul><ul><ul><li>Second 1/3 : Between 0.47 and 0.56 </li></ul></ul><ul><ul><li>Last 1/3 : Between 0.56 and 1.1 </li></ul></ul><ul><li>We can see whether some value appears multiple times </li></ul><ul><li>We can observe difference between successive values of the data </li></ul>
  15. 15. Limitations of Data Array <ul><li>Cumbersome to use when the volume of data is very large </li></ul><ul><li>Utility goes down as human mind cannot comprehend so much data in one shot </li></ul><ul><li>There is a need to compress this data and make it more accessible </li></ul>
  16. 16. Frequency Distribution <ul><li>A frequency distribution is a table that organises data into classes </li></ul><ul><ul><li>A class is a group of values describing ONE characteristic of the data </li></ul></ul><ul><li>It shows the number of observations from the data that fall into each class </li></ul><ul><ul><li>Frequency distribution can be constructed by determining how often ('with what frequency') values occur inside each class of a data set </li></ul></ul><ul><li>Fewer classes mean more data compression </li></ul>
  17. 17. Frequency Distribution
  18. 18. Relative Frequency Distribution <ul><li>Frequency of each value can be expressed as a fraction or percentage of the total number of observations </li></ul><ul><li>This could help us compare data from samples that are of different sizes </li></ul>
  19. 19. Discrete & Continuous Classes <ul><li>DISCRETE : In this case, the data in a class can take ONE discrete value : </li></ul><ul><ul><li>0, 1, 2, ... </li></ul></ul><ul><li>CONTINUOUS : In this case, the data in a class can take any value in a range </li></ul><ul><ul><li>> 0; <= 1 </li></ul></ul><ul><ul><li>> 1; <= 2 </li></ul></ul><ul><ul><li>> 2; <= 3 </li></ul></ul><ul><ul><li>And so on </li></ul></ul>
  20. 20. Qualitative & Continuous Classes <ul><li>Discrete Classes can also be used to model Qualitative Classes </li></ul><ul><ul><li>Where the data does not take specific numerical values but falls into certain qualitative that is non-numeric categories </li></ul></ul><ul><li>Continuous classes cannot have qualitative data </li></ul><ul><ul><li>Unless you want to prove a point !! </li></ul></ul>
  21. 21. Characteristics of Classes <ul><li>All Inclusive </li></ul><ul><ul><li>All the data must fall into or other class </li></ul></ul><ul><ul><li>Sum of relative frequencies must add up to 1 </li></ul></ul><ul><li>Mutually Exclusive </li></ul><ul><ul><li>Greater Than ( > ) Lower Class Boundary </li></ul></ul><ul><ul><li>Less Than OR Equal to ( <=) Upper Class Boundary </li></ul></ul><ul><li>First and Last Class open ended </li></ul>0 count for ratings <= 10 1 count for ratings > 90 ratings <= 100 0 count for ratings > 100
  22. 22. Constructing a Frequency Distribution <ul><li>Decide on Type of Class </li></ul><ul><ul><li>Quantitative or Qualitative measure ? </li></ul></ul><ul><li>Decide on Number of Classes </li></ul><ul><ul><li>More classes : give more information </li></ul></ul><ul><ul><li>Fewer classes : easier to interpret </li></ul></ul><ul><ul><li>Rule of Thumb : Between 6 and 15 classes </li></ul></ul><ul><li>Determine width of class interval </li></ul><ul><ul><li>[Largest Value] – [Unit Value before Smallest Value] </li></ul></ul><ul><ul><li>Total Number of Class Intervals </li></ul></ul><ul><li>Determine the number of points in each class </li></ul><ul><li>Illustrate the data in a chart </li></ul>
  23. 23. Using a spreadsheet to Construct a Frequency Distribution <ul><li>Functions used </li></ul><ul><ul><li>Max </li></ul></ul><ul><ul><li>Min </li></ul></ul><ul><ul><li>Roundup </li></ul></ul><ul><ul><li>Rounddown </li></ul></ul><ul><ul><li>Sum </li></ul></ul><ul><ul><li>Frequency </li></ul></ul>
  24. 24. Handling Qualitative Data <ul><li>Lookup </li></ul><ul><ul><li>Converts Text Data to Numeric data </li></ul></ul><ul><ul><li>Which is then used to create frequency distributions </li></ul></ul>
  25. 25. Creating Histograms <ul><li>Histogram </li></ul><ul><ul><li>A series of rectangles, each proportional in width to the range of values in a class and proportional in height to the frequency of that class </li></ul></ul><ul><ul><li>In a spreadsheet, care must be taken to choose the values in the X-axis correctly. Could be </li></ul></ul><ul><ul><ul><li>Class Number </li></ul></ul></ul><ul><ul><ul><li>Midpoint of class </li></ul></ul></ul>
  26. 26. Creating a Frequency Polygon <ul><li>Frequency Polygon </li></ul><ul><ul><li>Another way to show the frequency of each class </li></ul></ul><ul><ul><li>X-axis value is midpoint of class </li></ul></ul>
  27. 27. Histograms & Polygons <ul><li>Histograms </li></ul><ul><ul><li>The rectangle clearly shows each separate class in the distribution </li></ul></ul><ul><ul><li>The area of each rectangle, relative to all other rectangles, shows the proportion of the total number of observations that occur in the class </li></ul></ul><ul><li>Polygons </li></ul><ul><ul><li>The polygon is simpler than its histogram counterpart </li></ul></ul><ul><ul><li>It sketches an outline of the data pattern more clearly </li></ul></ul><ul><ul><li>The polygon becomes increasingly smooth and curvelike as we increase the number of classes and the number of observations </li></ul></ul><ul><ul><li>Will lead to more significant kinds of graphs </li></ul></ul>
  28. 28. Charting Cumulative Frequencies using “Less than” Ogives
  29. 29. Exotic Charts
  30. 30. What have we learnt ? <ul><li>Data Array </li></ul><ul><ul><li>Advantages and Disadvantages </li></ul></ul><ul><li>Frequency Distribution </li></ul><ul><ul><li>Relative Frequency Distribution </li></ul></ul><ul><ul><li>Cumulative Frequency Distribution </li></ul></ul><ul><li>Classes </li></ul><ul><ul><li>Qualitative </li></ul></ul><ul><ul><li>Quantitative </li></ul></ul><ul><ul><li>Discreet </li></ul></ul><ul><ul><li>Continuous </li></ul></ul><ul><li>Charts </li></ul><ul><ul><li>Histograms </li></ul></ul><ul><ul><li>Polygons </li></ul></ul><ul><ul><li>Ogives </li></ul></ul>