How to display your data

  • 1,072 views
Uploaded on

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,072
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
25
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Skewness: انحراف
  • Bell: جرس
  • Gridlines: شبكة ذات خطوط أفقية وعمودية متساوية الأبعاد
  • In its day, the following table contained medical dynamite. It presented the results of a study that showed, for the first time in vivo, that certain anti-inflammatory substances, such as indomethacin or aspirin, inhibit the synthesis of prostaglandin.Shown here is only the part of the table that includes the results from the three subjects receiving indomethacin.
  • I converted this part into a graph to show more clearly the dramatic effect of indomethacin.Note that the curves have the same type of line because they do not need to be distinguished from each other; they are all intendedto show the same trend. The curves are bolder than the axes. Two zeros are used to label the point where the axes meet.Axes are of equal length. The label of the vertical axis is parallel to the axis and reads from bottom to top. Note also that the legend gives the message of the figure.
  • Start with Excel (simple graphs)& then move on to SPSS and then STATA, SAS or R
  • Clustered or grouped bar chart Stacked or segmented bar charts
  • Clear title (with the sample size)Labelled axesNo gridlinesMarital status categories are ordered by their frequency
  • Tufte’s principleClear title (with the sample size)Labelled axesNo gridlinesMarital status categories are ordered by their frequency
  • Tufte’s principleClear title (with the sample size)Labelled axesNo gridlinesMarital status categories are ordered by their frequency
  • The advantage of using the frequencies is that the numbers in each category on the horizontal (X) axis can be readily seen. Using the percentage scale the percentages in each category can be easily discerned. Use of the percentage scale facilitates the comparison of groups.
  • Tufte’s principleClear title (with the sample size)Labelled axesNo gridlinesMarital status categories are ordered by their frequency
  • Consider for example the vaginal breech births category, there are only 16 individuals in this category compared to 2221 in the normal delivery category and so vaginal breech births comprise < 1% of births. However this is not the impression given in this three-dimensional chart. More visually exciting than two-dimentional chart but they are less clearer and more ambiguous.Easily displayed with computer technologySeen more & more often in published reports Data have only two dimensions & a third dimension falsely introduced in such cases.Reference:Gustavii B. How to write & illustrate scientific papers.Cambridge University Press, Cambridge, UK, 2nd edition, 2008.
  • Space of a journal column (8 cm) Column chart should include only few items Chart on the verge of being overcrowded Problem can be overcome with the use of a bar chart (computer term for horizontally arranged bars).
  • Clustered or grouped bar chartGrouped bar charts to display two or more sets of proportions.
  • Clustered or grouped bar chart
  • Segmented bar charts to display three or more sets of proportions.As the number of groups to be compared increases, a grouped bar chart can quickly become very busy and obscure patterns within the data. When the number of groups to be compared becomes greater than three or four, a better type of bar chart is the segmented bar chart, where the groups are arranged on the horizontal axis and the variable being compared between the groups is arranged on the vertical axis. As the comparison of interest is between women of different ages, age should be on the horizontal axis and method of feeding on the vertical axis. From this segmented bar chart, it can easily be seen that there is a tendency for increasing breast-feeding as maternal age increases, with the exception of the oldest mothers. Note that the vertical axis has been scaled, from 0 to 100, to represent the percentage in each age group who use a particular feeding method.
  • Use color with caution
  • Difficult to read and interpretThe area displayed should be proportional to the relative frequencies for each group. However, when the charts are displayed as three dimensional this relationship is lost as what is displayed becomes a volume.Only the front face is proportional to the numbers in the categories and so only these should be displayed.In particular, categories with only a few individuals are given undue weight in three dimensional charts as the top face is much more prominent.Consider for example the vaginal breech births category, there are only 16 individuals in this category compared to 2221 in the normal delivery category and so vaginal breech births comprise < 1% of births. However this is not the impression given in the three-dimensional chart.
  • One of the simplest ways of displaying all the data.Each point represents a value for a single individual.Always display continuous data as dotplots if the sample size per group is low (100 subjects).
  • One of the simplest ways of displaying all the data.Each point represents a value for a single individual.Always display continuous data as dotplots if the sample size per group is low (100 subjects).
  • Stem: ساقLeaf:ورقة For example, for a height of 1.58 m, the leaf would be 8 and the stem would be 1.5Number of data points in each stem can also be displayed on the left.Simple matter to work out the median in the stem & leaf plot.In this case there are 77 observations and thus the median is the 39th value (when the data are ordered), as 38 observations lie below this point and 38 lie above. Looking at the plot, it can be seen that the 39th value occurs in stem 1.75 and the leaf value corresponding to the 39th value is 8. Thus the median for these data is a height of 1.78 m.The stem and leaf plot resembles a histogram turned over onto its side.The advantage of a stem and leaf plot over a histogram is that not only does it show the frequency in each stem but that itretains the individual values of the data.
  • The bars may be labelled by using the midpoint of the corresponding interval, or by having a label at the start (or end) of the interval. For histograms, we recommend that you label the horizontal axis, at the start (or end) of each interval, since with this method it is easier to work out the width of the interval. However bar charts are used for discontinuous data, where the categories are entirely separate while histograms are used for continuous data. Thus bar charts have gaps between the categories on the horizontal axis in order to emphasise that the categories are completelyseparate, whereas there are no spaces in between the bins for a histogram, as the width of these bins can be set by the investigator.A useful feature of a histogram is that it is possible to assess the distribution form of the data; in particular whether the data are approximately normally distributed, or are skewed.
  • The bars may be labelled by using the midpoint of the corresponding interval, or by having a label at the start (or end) of the interval. For histograms, we recommend that you label the horizontal axis, at the start (or end) of each interval, since with this method it is easier to work out the width of the interval. However bar charts are used for discontinuous data, where the categories are entirely separate while histograms are used for continuous data. Thus bar charts have gaps between the categories on the horizontal axis in order to emphasise that the categories are completelyseparate, whereas there are no spaces in between the bins for a histogram, as the width of these bins can be set by the investigator.A useful feature of a histogram is that it is possible to assess the distribution form of the data; in particular whether the data are approximately normally distributed, or are skewed.
  • The use of health-related quality of life (HRQoL) measures is becoming more frequent in clinical trials and health services research, both as primary and secondary outcomes. It is typically assessed by a self-completed questionnaire which asks a series of standardised questions about various aspects or facets of a person’s HRQoL. The Medical Outcomes Study 36-Item Short Form (SF-36) is the most commonly used HRQoL measure in the world today. It contains 36 questions measuring health across eight dimensions: physical functioning (PF); role limitation because of physical health (RP); social functioning (SF); vitality (VT); bodily pain (BP); mental health (MH); role limitation because of emotional problems (RE) and general health (GH). These eight dimensions are usually regarded as a continuous outcome and are scored on a 0–100 scale, where 100 indicates ‘good health’.
  • Keep the intervals in histogram (bin width) equal
  • Secular:قرني – حادث مرة كل قرن
  • So which scale should be chosen? There is no perfect answer to this question. All present the same information, and none strictly speaking are incorrect.In this case, if I were presenting this chart without reference to any other graphics, the scale would be the first because it shows the true floor for the data (0%, which is the lowest possible value) and includes a reasonable range above the highest data point.
  • In the right-hand graph you will probably not miss the data points, as you can easily discern the change of line direction wherethe points have been omitted. This graph may be the more attractive of the two. Data points are probably overused in scientific papers.
  • The use of health-related quality of life (HRQoL) measures is becoming more frequent in clinical trials and health services research, both as primary and secondary outcomes. It is typically assessed by a self-completed questionnaire which asks a series of standardised questions about various aspects or facets of a person’s HRQoL. The Medical Outcomes Study 36-Item Short Form (SF-36) is the most commonly used HRQoL measure in the world today. It contains 36 questions measuring health across eight dimensions: physical functioning (PF); role limitation because of physical health (RP); social functioning (SF); vitality (VT); bodily pain (BP); mental health (MH); role limitation because of emotional problems (RE) and general health (GH). These eight dimensions are usually regarded as a continuous outcome and are scored on a 0–100 scale, where 100 indicates ‘good health’.
  • In North America, an increase in the number of cases with TB has been observed since the mid-1980s mainly attributable to immigration, human immunodeficiency virus and the development of multidrug-resistant strains of TB.

Transcript

  • 1. How to display your data? Samir Haffar M.D. Assistant Professor of Gastroenterology
  • 2. Who might benefit? • Researchers who want to display results of their studies for publication in a journal • Readers of research literature who wish to do a critical appraisal of a piece of work • People who have to deliver a presentation
  • 3. Freeman JV, Walters SJ, Campbell MJ. How to display data. Blackwell Publishing, Massachusetts, USA, 1st edition, 2008. The best advice that a statistician can give a researcher is to first plot the data Conventional statistics textbooks give only brief details on how to draw figures & display data
  • 4. Types of data Qualitative (Categorical) Quantitative (Numerical) Ordinal Ordered Rate pain None Mild Moderate Severe Continuous Range of values No gap Blood glucose BP (mmHg) Counted Certain values Gaps Days stick/year Petrie A & Sabin C. Medical statistics at a glance. Blackwell Publishing, Massachusetts, USA, 2nd edition , 2005. Dichotomous Only 2 values Present/absent Alive/dead Nominal Unordered Blood type A B AB O
  • 5. How to present your data?  Numbers  Tables  Graphs
  • 6.  Displaying your data with numbers
  • 7. Presenting numbers -1 • Numbers expressed in numerals rather than in words • Decimal sign is a point preceded by 0 [ 0.3 not 0,3) ] • Use space to mark off thousands [ 12 345 not 12,345 ] • Remove surplus zeros: 1.6 x 109 bacteria/ml • Never use billion: 109 in USA & 1012 in Europe • Use only one slash to express quotients of units: km/h Use negative exponents if >2 [ mg.kg-1.h-1 not mg/kg/h ]
  • 8. Presentation of numbers - 2 Report total no of observations • Qualitative data Use both frequencies & percentages • Quantitative data Normal distribution Mean & SD (one decimal place) Skewed distribution Median & IQR* * IQR: Interquartile range Freeman JV et all. How to display data. Blackwell Publishing, Massachusetts, 2008.
  • 9. Use of percentages Total number Percentages & decimals < 25 Percentages should not be used Gustavii B. How to write & illustrate scientific papers. Cambridge University Press, Cambridge, UK, 2nd edition, 2008. 25 – 100 Percentages without decimals [ 7% not 7.2% ] 100 – 100 000 Only one decimal added [ 7.2% not 7.23% ] > 100 000 Two decimals added [ 7.23% not 7.235% ]
  • 10. Normal distribution Sometimes known as Gaussian distribution Harris M Taylor G. Medical statistics made easy. Martin Dunitz, 1st edition, London, 2003. Classic „bell‟ shape Peak in the middle (mean) Symmetrical tails Mean = median = mode Mean Sum of values/number of observations Median Number of observations above = number below Mode Most frequently occurring value
  • 11. Standard normal distribution Chernick MR & Friis RH. Introductory biostatistics for the health sciences. John Wiley & Sons, New Jersey, USA, 1st edition, 2003 One decimal place
  • 12. P < 0.05 Value of test statistic P < 0.05 Perera R, Heneghan C & Badenoch D. Statistics toolkit. Blackwell Publishing & BMJ Books, Oxford, 1st edition, 2008. Reject the null hypothesis Results significant at the 5% level
  • 13. Skewness Perera R, Heneghan C & Badenoch D. Statistics toolkit. Blackwell Publishing & BMJ Books, Oxford, 1st edition, 2008. Peak at lower values Long tail of higher values Peak at higher values Long tail of lower values
  • 14. Skewed data distribution Peat JK & all. Health science research: a handbook of quantitative methods. Allen & Unwin, Rows Nest, Australia, 1st edition, 2001. The mean is an over-estimate of the median value Mean ≠ median ≠ mode
  • 15. Anatomy of a box-whisker plot Morgan GA et all. Understanding & evaluating research in applied & clinical settings. Lawrence Erlbaum Associates, New Jersey, USA, 2006. Especially good to show differences between groups
  • 16. Box-Whisker Plot Urinary lead concentration in urban & rural children Swinscow TDV & Campbell MJ. Statistics at square one. BMJ Books, London, 10th edition, 2002.
  • 17. Central tendency & dispersion • Central tendency Mean Sum of values/number of observations Median Number of observations above = number below Mode Most frequently occurring value • Dispersion Range From lowest to highest value SD Average difference of values from mean Quartile % of observations falling between specif values Fletcher R et all. Clinical epidemiology. Williams & Wilkins, Baltimore, USA, 3rd edition, 1996.
  • 18. Fletcher R et all. Clinical epidemiology. Williams & Wilkins, Baltimore, USA, 3rd edition, 1996. Central tendency & dispersion Distribution of PSA in presumably normal men
  • 19. Bimodal distribution Data have 2 peaks There may be two different populations Each with its own central tendency Perera R, Heneghan C & Badenoch D. Statistics toolkit. Blackwell Publishing & BMJ Books, Oxford, 1st edition, 2008.
  • 20. Uniform distribution Number of peaks All possible values are equally likely Central tendency measure not useful Perera R, Heneghan C & Badenoch D. Statistics toolkit. Blackwell Publishing & BMJ Books, Oxford, 1st edition, 2008.
  • 21.  Displaying your data with tables
  • 22. • Tufte‟s principle • Clear title with sample size • Solid lines kept to minimum particularly vertical ones • Columns and rows clearly labeled • Rows & columns ordered by size if no natural ordering Recommendations to present data in tables – 1
  • 23. • Numbers rounded to 2 effective digits • Qualitative data Frequency & percentage • Quantitative data Symmetrically Mean & SD Skewed Median & IQR* * IQR: Interquartile range Freeman JV, Walters SJ, Campbell MJ. How to display data. Blackwell Publishing, Massachusetts, USA, 1st edition, 2008. Recommendations to present data in tables – 2
  • 24. Tufte’s principle for table & graph Maximum amount of information for minimum amount of ink Tufte ER. The visual display of quantitative information. Cheshire, Connecticut: Graphics Press; 1983
  • 25. Marital status Frequency Percent Married 104 46.0 Widowed 86 38.1 Single 25 11.1 Divorced/separated 11 4.9 Total 226 100.0 Marital status of 226 patients in leg ulcer study BMJ 1998 ; 316 : 1487 – 91. Title Headings Body Source 4 x 2 contingency table 4 rows x 2 columns = 8 cells
  • 26. Marital status Frequency Percent Divorced/separated 11 4.9 Married 104 46.0 Single 25 11.1 Widowed 86 38.1 Total 226 100.0 Marital status of 226 patients in leg ulcer study BMJ 1998 ; 316 : 1487 – 91. Ordered alphabetically Hard to interpret
  • 27. Marital status Frequency Percent Married 104 46.0 Widowed 86 38.1 Single 25 11.1 Divorced/separated 11 4.9 Total 226 100.0 Marital status of 226 patients in leg ulcer study BMJ 1998 ; 316 : 1487 – 91. Much easier to interpret Ordered by size
  • 28.  Displaying your data with graphs
  • 29. Table or graph? Choice between using a table or a figure not easy Nor is it easy to offer much general guidance Altman D & Bland M. Presentation of numerical data. BMJ 1996 ; 312 : 572.
  • 30. Table or graph? Graph Table Better in presentations Better in papers Freeman JV, Walters SJ, Campbell MJ. How to display data. Blackwell Publishing, Massachusetts, USA, 1st edition, 2008. Can only show summaries Can often show all the data Show only a few variables Better for multiple variables Trend better illustrated Trend badly illustrated
  • 31. Table or graph? Trend badly illustrated with a table Urinary prostaglandin metabolite (mg/24 h) Subject Day 1 Day 2 Day 3 Day 4 Day 5 Day 6 Day 7 I 4.8 4.8 1.8* 1.1* 1.5* 2 .7 4.1 II 3.9 4.4 0.7* 0.7* 0.7* 3.1 6.5 III 3.8 3.0 0.5* 0.3* 0.3* 0.8 1.1 Subject taking indomethacin 4 x 50 mg/24 h Hamberg 1972 Urinary excretion of PG metabolite after indomethacin administration
  • 32. Hamberg 1972 Urinary excretion of a prostaglandin metabolite decreased following indomethacin administration in three humans Table or graph? Trend better illustrated with a graph
  • 33. Why use of graphs in presentation? • You need to get your audience‟s attention • Many people respond better to visual cues than to straight text or lists of numbers • Effective graph can help drive home your point
  • 34. Software for graphs • No single package can draw all graphs to display data • Simple graphs can be drawn in Microsoft Excel • More complex graphs Major statistical packages: SPSS, STATA, SAS S-Plus for superimposing several graphs into single figure • Packages change regularly
  • 35. Types of graph • Bar/column graph & variants • Pie graph • Dot plot • Stem & leaf plot • Histogram • Box-whisker plot • Line graph • Spider or radar plot • Pictogram • Venn diagram
  • 36. Types of data • Qualitative (categorical) Dichotomous Only 2 values Nominal Unordered Ordinal Ordered • Quantitative (numerical) Counted Gaps Continuous No gaps
  • 37. Displaying qualitative data • Bar/column graph • Grouped column graph • Segmented column graph • Pie graph
  • 38. Recommendations for construction of graph • Tufte‟s principle • Clear title with sample size • Labeled axes • Gridlines kept to a minimum • Categories ordered by size • No three-dimensional graphs
  • 39. Tufte’s golden rule Tufte ER. The visual display of quantitative information. Cheshire, Connecticut: Graphics Press; 1983. Maximum amount of information for minimum amount of ink
  • 40. Column chart Marital status for 226 patients in leg ulcer study BMJ 1998 ; 316 : 1487 – 91. Columns wider than spaces between them Bars have gray tone which is more pleasing to the eye Vertical axis doesn’t extend beyond what the graph demands
  • 41. BMJ 1998 ; 316 : 1487 – 91. Column chart Marital status for 226 patients in leg ulcer study Only the height of columns presents the data of interest
  • 42. Column chart Marital status for 226 patients in leg ulcer study BMJ 1998 ; 316 : 1487 – 91. Tufte‟s principle
  • 43. Column chart Marital status for 226 patients in leg ulcer study BMJ 1998 ; 316 : 1487 – 91. Clear title with sample size
  • 44. Column chart Marital status for 226 patients in leg ulcer study BMJ 1998 ; 316 : 1487 – 91. Labeled axes
  • 45. Column chart Marital status for 226 patients in leg ulcer study BMJ 1998 ; 316 : 1487 – 91. No gridlines
  • 46. Column chart Marital status for 226 patients in leg ulcer study BMJ 1998 ; 316 : 1487 – 91. Categories ordered by size
  • 47. Column chart Marital status for 226 patients in leg ulcer study BMJ 1998 ; 316 : 1487 – 91. No three-dimensional graph
  • 48. Schott B. Schott’s almanac. London: Bloomsbury; 2006. The most populous country, Germany, can be readily seen It’s not obvious for France, Italy, & UK which has largest population Bar chart ordered alphabetically Population for 20 European countries in 2004
  • 49. Bar chart ordered by size Population for 20 European countries in 2004 It is clear now how each country relates to others for population size Schott B. Schott’s almanac. London: Bloomsbury; 2006.
  • 50. BMJ 2002 ; 324 : 643 – 6. Three-dimensional column charts Self-reported type of delivery for all new mothers (N: 3 321) Data have only two dimensions A third dimension is falsely introduced
  • 51. Two-dimensional column charts Self-reported type of delivery for all new mothers (N: 3 321) BMJ 2002 ; 324 : 643 – 6.
  • 52. Two-dimensional column charts Self-reported type of delivery for all new mothers (N: 3 321) 2 separate texts nearly run into each other Space of a journal column (8 cm) Chart on the verge of being overcrowded Problem overcome with use of bar (horizontal) chart BMJ 2002 ; 324 : 643 – 6.
  • 53. Bar chart Annual alcohol consumption per inhabitant in Europe Gustavii B. How to write & illustrate scientific papers. Cambridge University Press, Cambridge, UK, 2nd edition, 2008.
  • 54. BMJ 2002 ; 324 : 643 – 6. Two-dimensional column charts Self-reported type of delivery for all new mothers (N: 3 321)
  • 55. BMJ 2002 ; 324 : 643 – 6. Two-dimensional column charts Self-reported type of delivery for all new mothers (N: 3 321)
  • 56. BMJ 2002 ; 324 : 643 – 6. Grouped column graph Self-reported type of delivery for all new mothers (N: 3 321)
  • 57. Grouped column graph Probability of dying in ICU after admission with AMI Gustavii B. How to write & illustrate scientific papers. Cambridge University Press, Cambridge, UK, 2nd edition, 2008. 2 – 3 categories in each group should be the maximum Remove the keys
  • 58. Gustavii B. How to write & illustrate scientific papers. Cambridge University Press, Cambridge, UK, 2nd edition, 2008. Grouped bar chart Probability of dying in ICU after admission with AMI Remove the keys One way to remove the keys is to label first group directly
  • 59. Segmented column charts Feeding method by maternal age for all women BMJ 2002 ; 324 : 643 – 6.
  • 60. Pie chart Appropriate usage in a magazine article Large segment begins at 12 o‟clock Proceed in clockwise direction Ordered by size No of observations & percentages Number of segments ≤ 5 Color employed with caution Freeman JV, Walters SJ, Campbell MJ. How to display data. Blackwell Publishing, Massachusetts, USA, 1st edition, 2008.
  • 61. Pie chart Self-reported type of delivery for all new mothers (N: 3 321)
  • 62. Pie chart Self-reported type of delivery for all new mothers (N: 3 321)
  • 63. Three-dimensional pie charts Self-reported type of delivery for all new mothers (N: 3 321)
  • 64. Pie charts Pull out the slice you want to highlight
  • 65. “The only worse design than a pie chart is several of them” Tufte ER. The visual display of quantitative information. Cheshire, Connecticut: Graphics Press; 1983
  • 66. Types of data • Qualitative (categorical) Dichotomous Only 2 values Nominal Unordered Ordinal Ordered • Quantitative (numerical) Counted Gaps Continuous No gaps
  • 67. Display quantitative data • Counted (gaps) Bar chart • Continuous (no gaps) Dot plot Stem & leaf plot Histograms Box-whisker plot
  • 68. Dot plot BAO in normal subjects & PU patients Blair AJ et al. J Clin Invest 1987 ; 79 : 582. Use dot plot if sample size per group is low (<100) Each point represents a value for a single individual Horizontal lines indicate mean values Mean values
  • 69. Dot plot BAO & PAO in normal subjects & PU patients Blair AJ et al. J Clin Invest 1987 ; 79 : 582. Substantial overlap in values among individuals in the groups
  • 70. Stem and leaf plot Height of male in leg ulcer patients (n: 77) Each data is divided into 2 parts: leaf (last digit) & stem (other part) Separate line for each different stem value Stem on left of plot & leaves on right Frequency Stem Leaf 1 1.55- 7 3 1.60- 333 4 1.65- 5588 18 1.70- 000000333333333333 24 1.75- 555558888888888888888888 15 1.80- 000000003333333 10 1.85- 5555888888 1 1.90- 13 Median
  • 71. Histogram Serum albumin in 481 white men aged over 20 BMJ 1999 ; 318 : 1667. No gaps between columns (continuous data) Keep same width of each group (bin width) Columns labeled by using midpoint, or better start or end of interval
  • 72. Histogram – Normal distribution Serum albumin in 481 white men aged over 20 BMJ 1999 ; 318 : 1667. Mean: 46.14 g/l – SD: 3.08 g/l
  • 73. Histogram – Positively skewed data Baseline ulcer area from the leg ulcer trial (n: 233) BMJ 1998 ; 316 : 1487 – 91. Peak at lower values & a long tail of higher values
  • 74. Histogram – Negatively skewed data Baseline social functioning in leg ulcer trial (n: 233) BMJ 1998 ; 316 : 1487 – 91. Long left tail of lower values & peak at higher values
  • 75. Number of categories in a histogram No hard & fast rules about appropriate number • Too few Much important information lost • Too many Patterns obscured by too much detail • Usually 5 – 15 categories will be enough
  • 76. Number of categories in a histogram Height for leg ulcer patients (n 233) Too few (6 categories) Freeman JV et all. How to display data. Blackwell Publishing, MA, USA, 2008. Too many (22 categories) Good (9 categories)
  • 77. Box-and-whiskers plots Gonick L & Smith W. The cartoon guide to statistics. HarperCollins Publishers, New York, USA, 1st edition, 1993 Especially good to show differences between groups
  • 78. Box-whisker plot Freeman JV et all. How to display data. Blackwell Publishing, Massachusetts, 2008. As there are many variations, you have to explain details of the plot
  • 79. Box-and-whiskers plots Liver stiffness for each Metavir stage in CHC Vertical axis is in logarithmic scale (wide range of F4 values) Gastroenterology 2005 ; 28 : 343 – 350.
  • 80. Line graph TB mortality in England &Wales Farmer R Lawrenson R. Lecture Notes: Epidemiology & public health medicine. Blackwell Publishing, Oxford, 5th edition, 2004
  • 81. Line graph – Arithmetic scale TB mortality in England &Wales Farmer R Lawrenson R. Lecture Notes: Epidemiology & public health medicine. Blackwell Publishing, Oxford, 5th edition, 2004 Mortality seems hardly affected by the events They played little part in mortality decline
  • 82. Introduction of BCG vaccine & chemotherapy was associated with acceleration in established decline in mortality Farmer R, Lawrenson R. Lecture Notes: Epidemiology & public health medicine. Blackwell Publishing, Oxford, 5th edition, 2004 Line graph – Logarithmetic scale TB mortality in England &Wales
  • 83. It is frequently necessary to examine secular trends both as changes in rates (arithmetic scale) and as rates of change (logarithmic scale) if the nature of a trend is to be fully appreciated Farmer R Lawrenson R. Lecture Notes: Epidemiology & public health medicine. Blackwell Publishing, Oxford, 5th edition, 2004
  • 84. Line chart Obesity among adults from 1990 – 2002 (US-CDC) Boslaugh S & Watters PA. Statistics in a nutshell. O’Reilly Media, California, USA, 1st edition, 2008.
  • 85. Smaller range for y-axis increases visual impact of the trend Line chart Obesity among adults from 1990 – 2002 (US-CDC) Boslaugh S & Watters PA. Statistics in a nutshell. O’Reilly Media, California, USA, 1st edition, 2008.
  • 86. Line chart Obesity among adults from 1990 – 2002 (US-CDC) Wider range for the y-axis decreases visual impact of the trend Boslaugh S & Watters PA. Statistics in a nutshell. O’Reilly Media, California, USA, 1st edition, 2008.
  • 87. Which scale should be chosen? • No perfect answer to this question All present the same information None strictly speaking are incorrect • In this case, the scale would be the first It shows true floor for data (0%, lowest possible value) It includes reasonable range above highest data point Boslaugh S & Watters PA. Statistics in a nutshell. O’Reilly Media, California, USA, 1st edition, 2008.
  • 88. Line chart Obesity among adults from 1990 – 2002 (US-CDC) Boslaugh S & Watters PA. Statistics in a nutshell. O’Reilly Media, California, USA, 1st edition, 2008.
  • 89. Line graph Effect of tyramine solution on pupillary size Gustavii B. How to write & illustrate scientific papers. Cambridge University Press, Cambridge, UK, 2nd edition, 2008.
  • 90. Line graph Effect of tyramine solution on pupillary size Two common defects: 1- Curves distinguished both by: - Type of line - Type of data-point symbol 2- Curves identified by separate key Reader scan back & forth to the key to see what they represent Gustavii B. How to write & illustrate scientific papers. Cambridge University Press, Cambridge, UK, 2nd edition, 2008.
  • 91. Redrawn line graphs Type of data-point symbol Labeled directly Type of line Labeled directly Gustavii B. How to write & illustrate scientific papers. Cambridge University Press, Cambridge, UK, 2nd edition, 2008.
  • 92. Line graph HP seroprevalence in USA in function of age & race Making trend lines thick for easy visibility Maximum: 3 – 4 lines Gastroenterology 1992 ; 103 : 813.
  • 93. Characteristics of some graphs Good for showing separate unrelated pieces of data Bar/column graph Good for showing Percentages Pie graph Good for showing how data changes over time Line graph
  • 94. Spider or radar plot Acupuncture vs usual care in persistent non-specific back pain BMJ 2006 ; 333 : 623 – 6. HRQol assessed over 12 months by SF-36 SF-36 dimensions scored on a 0 (poor) to 100 (good) health scale
  • 95. Pictogram Estimated annual incidence of TB in 2006 Global tuberculosis control: surveillance, planning, financing WHO report 2008
  • 96. Venn diagram Any number of overlapping circles in theory When > 3 – 4 circles, the diagram becomes rather cluttered
  • 97. The 3 components of EBM “EBM is the integration of best research evidence with clinical expertise & patient values” - David Sackett EBM Best research evidence Clinical Expertise Patient Concerns
  • 98. Perera R et al. Statistics Toolkit. Blackwell Publishing, MA, USA, 1st edition, 2008 Types of data Qualitative - Quantitative Essentials you need to get started Null hypothesis & alternative hypothesis H0 & H1 What type of test? Choose the right type of test Is it significant? Compute the value of the statistic & compare with the critical value Software tools Start with Excel (simple graphs) & then move on to SPSS & then STATA, SAS or R Place of graphs in your study
  • 99. Useful questions to ask when considering how to display your data • What do you want to show? Type of data – Normal or skewed distribution • What methods are available for this? Table – Graph – Type of graph • Is the method chosen the best? Would another have been better? Freeman JV et al. How to display data. Blackwell Publishing, MA, USA, 1st edition, 2008.
  • 100. Blackwell Publishing 1st edition – 2008 Blackwell Publishing 2nd edition – 2005 Suggested readings Martin Dunitz 1st edition – 2003
  • 101. Thank You