News letter           Februar y 2011Multiple Comparisons in Clinical TrialMost of the doubts in the results of randomized ...
Newslet ter       Feb ruar y 2011(multiple comparisons), instead just perform one test for           compared to subjects ...
Newslet ter        Feb ruar y 2011SAS System                                                               An array statem...
Newslet ter    Feb ruar y 2011 array ae_array {3, 12} aeterm1-aeterm12 Preferredterm1 –           DO i=1 to 7;Preferredter...
Newslet ter           Feb ruar y 2011 array big{2:6} weight sex height state city;                   original data, each p...
Upcoming SlideShare
Loading in...5
×

Multiple Comparisons and SAS Arrays in Clinical Trials

718

Published on

MakroCare is a global functional service provider specialized in Biostatistics, SAS Programing. The companies state-of-the-art facility in Hyderabad, India is comprised of highly qualified SAS programmers dedicated to biopharmaceutical development projects.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
718
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
12
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Multiple Comparisons and SAS Arrays in Clinical Trials

  1. 1. News letter Februar y 2011Multiple Comparisons in Clinical TrialMost of the doubts in the results of randomized clinical trials(RCT) result either from inadequate sample size or fromproblems of multiplicity. The Problems of multiplicity arisesfrom the testing of multiple hypotheses or the testing of ahypothesis at multiple points in time. Several commonproblems of this type includes: multiple analyses ofaccumulating data at different time points like frequentinterim analysis, analyses of multiple endpoints, multiplesubgroup analysis of Subjects, multiple treatment groupcontrasts and interpreting the results of multiple clinicaltrials especially in meta analysis. The Clinical trials oftenrequire number of outcomes to be calculated and a numberof hypotheses to be tested. Such testing involves comparingtreatments using multiple outcome measures withunivariate statistical methods. Studies with multipleoutcome measures occur frequently within medicalresearch. Some researchers recommend adjusting thep-values when clinical trials use multiple outcome measuresso as to prevent the findings from falsely claiming "statistical 3. Scheffe, Tukey, and Tukey–Kramer procedures:significance". But some researches have not agreed with this 1.These procedures give great flexibility for manystrategy, because it is not appropriate and may cause randomized biological experiments where variationsmislead the conclusions from the study. between experimental units are not of major concern and the experimenter is not able to specify specific comparisonsMultiple tests make the traditional 0.05 level of test no or contrasts in advance.longer necessarily valid and needs to be controlled.However, in the case of a study that includes Scheffe’s procedure provides a way of looking into allmultiple-treatment groups and/or multiple endpoints, a possible linear contrasts of K treatment means with themultiple hypotheses testing procedure is used to control the adjustment of Type I error rate inflation. Similarly, Tukey andtype 1 error. Tukey–Kramer procedures provide tests and confidence intervals for all possible pairwise comparisons of theMost commonly used multiple tests procedures are: treatment means. However, as the number K increases, these methods become quite conservative in declaring1. Bonferroni and Sidak procedures: significance.The Bonferroni is probably the most commonly used test,because it is highly flexible, very simple to compute, and can Are multiple comparisons really needed?be used with any type of statistical test. We divide the levelof significance by total no-of comparisons. For example if in Here are three situations where multiple comparisons area clinical trial we compare two treatments within five not needed. subsets of Subjects the treatments will be significantlydifferent at the 0.01 level if there is a P value less than 0.01(α 1.1.The account for multiple comparisons when we* = 0.05 /5) within any of the subsets. In Sidak procedure, interpret the results rather than in the calculationswhich is modified procedure of Bonferroni, each test is The Testing of multiple hypotheses at once creates acarried out at level α* = 1- [(1-α) 1/K]. These methods are confusion that cannot be escaped. If we do  not  make anyrecommended when the comparison tests are independent. corrections for multiple comparisons, it becomes very easy to find significant results by chance -- it is too easy to make2. Dunnett’s test: a Type I error. But if we  do corrections for multipleThis is a classical and a frequently used test for many comparisons, we lose power to detect real differences -- it israndomized laboratory experiments and even for clinical too easy to make a Type II error. The only way to escape thistrials where multiple-treatment means are compared to that dilemma is to focus on analyses, and thus avoid makingof the control; the multiple treatments are often multiple multiple comparisons. For example, if the treatments aredoses of the same treatment. ordered, then dont compare each mean with other means 01
  2. 2. Newslet ter Feb ruar y 2011(multiple comparisons), instead just perform one test for compared to subjects taking placebo. The drug worked. Thetrend to check if the outcomes are linearly related or not. investigators also analyzed each of the endpoints. ThoseAnother situation is that if there is a positive and negative taking the drug had fewer deaths, and fewer heart attacks,control groups included apart from experimental groups, and fewer strokes, and fewer hospitalizations for chest painthen dont include them as part of the ANOVA and as part of (compared to those who are taking placebo). The data fromthe multiple comparisons. Some statisticians recommend different demographic groups were then analyzedthat no need for correcting type 1 error for multiple separately. Separate subgroup analyses were done for mencomparisons while analyzing data. Instead report all and women, old and young, smokers and nonsmokers,individual P values and confidence intervals, and make it subjects with hypertension and without, subjects with aclear that no mathematical correction was made for multiple family history of heart disease and those without. In each ofcomparisons. When we interpret these results, we need to 25 subgroups, Subjects receiving the Statin druginformally account for multiple comparisons. experienced fewer primary endpoints than those taking placebo, and all these effects were statistically significant.2. The corrections or adjustments may not be needed if we The investigators had made no correction for multiplemake only a few planned comparisons comparisons for all these separate analyses of outcomes andSome statisticians recommend not doing any formal subgroups. No adjustments or corrections were needed,corrections or adjustment for multiple comparisons when because the results are so consistent. The each multiplethe study focuses only on a few scientifically sensible comparisons ask the same basic question in a different way,comparisons, rather than every possible comparison. The and all comparisons pointed to the same conclusion thatterm planned comparison to describe this situation subjects taking the drug had less cardiovascular disease(Planned comparison: It requires that we focus on a few than those taking placebo.scientifically sensible comparisons, we cant decide whichcomparisons to do after looking at the data. The choice must The treatment comparisons in randomized clinical trialsbe based on the scientific questions we are asking, and be usually involve many endpoints such that conventionalchosen when we design the experiment). significance testing can seriously inflate the overall type 1 error rate. One option is to select a single endpoint for3. The Correction or adjustment for multiple comparisons formal statistical inference, but this is not always feasible.are not needed when the comparisons are complementary Another approach is to apply bonferroni correction (i.e.The example for this situation is taken from the study multiply each p-value by the total no-of comparisons). Thereported by Ridker and colleagues. They asked whether excessive use of the multiple significance tests in clinicallowering LDL cholesterol would prevent heart disease in trials can greatly increase the probability of false positiveSubjects who did not have high LDL concentrations and did findings. The problem is difficult by the fact that endpointsnot have a prior history of heart disease (but did have an are usually correlated or related and studies often have aabnormal blood test suggesting the presence of some mixture of data types, e.g. quantitative, binary and survivalinflammatory disease). The study included almost 18,000 data. Perhaps the common method in the medical literaturesubjects. Half of subjects received a statin drug to lower LDL is to analyze each endpoint separately, presenting multiplecholesterol and half received placebo. The investigators’ p-values and an overall subjective conclusion. At best, thisprimary goal was to compare the number of “end points” provides an open display of data enabling readers to drawwhich occurred in the two groups, including deaths from a their own (possibly different) conclusions. WWWheneverheart attack or stroke, nonfatal heart attacks or strokes, and multiple comparisons are taking place we need to adjust thehospitalization for the chest pain. These events happened type 1 error rate accordingly, except a few situationsabout half as often to many Subjects treated with the drug mentioned in this paper SAS Arrays in Clinical TrialsIntroduction:Statistical Analysis System (SAS) is an integral part of clinicaltrial data management and statistical analysis. TheRegulatory agencies like FDA insist to use SAS for clinicaltrial data analysis. SAS programmers write programs invarious ways to produce the tables, listings and figures(TLFs). However the efficient programmers write few lines ofcode to produce the final TLFs. The objective of this paper isto highlight the use of arrays in SAS programming whichmay be most efficient, time saving and cost effective in thepharmaceutical industry. 02
  3. 3. Newslet ter Feb ruar y 2011SAS System An array statement must be used to define an arraySAS System helps to analyze and organize a collection of before the array name can be referenceddata items using SAS programming statements. A SAS If the elements are not specified on the ARRAYprogram is a collection of SAS statements in a logical sequence. statement, SAS will use the Array name, append an element number as a suffix starting at 1 and check to seeSAS is available in multiple computing environments like if that variable name exists already in the Program DataWindows, Unix etc. Vector (PDV). If those variable names do not exist, it is the array that actually creates them as variables in the PDVSAS ArrayIt is a temporary grouping of SAS variables that are arranged _TEMPORARY_ signals to SAS that it does not need toin a particular order create actual variables In the PDV for this array and that the elements of the array will be held in memory but not is identified by an array name output as variables to the data set. exists only for the duration of the current DATA step By using the asterisk *, SAS will count the number of is not a variable array variables The array name can be any name as long as it does notOnce the array has been defined the programmer is now match any of the variable names in data set or any SASable to perform the same tasks for a series of related keywords and it must adhere to the SAS namingvariables, the array elements. Arrays are widely used in the conventionPharmaceutical Industry. Array names cannot be used in label, format, drop, keepThe use of arrays allows simplify processing of SAS. Arrays helps or length statementsread and analyze repetitive data with a minimum coding. USING ARRAY INDEXES:The ARRAY statement defines the elements in an array. The array index is the range of array elements.These elements will be processed as a group and refers toelements of the array by the array name and subscript. For example, the temperature for each of the 24 hours of the day is defined as:Syntax: array temperature_array {24} temp1 – temp24;Array array-name (index variable) <$> <length>array-elements <(initial-values)>; There may be scenarios when the index has to begin at a lower bound other than 1 (say 6) and upper bound otherThe ARRAY statements provides the following information than 24 (say 18). This is possible by modifying the subscriptabout SAS array: value when the array is defined. array-name – Any valid SAS name array temperature_array {6:18} temp6 – temp18; index variable– Number of elements within the array The subscript can be written as the lower bound and upper bound of the range, separated by a colon. $ - Indicates character type variables are elements within the array ONE DIMENSION ARRAYS: array-elements – List of SAS variables to be part of the array The array statement to define the one-dimensional array will length – A common length for the array elements be, for example initial values – Provides the initial values for each of the array temperature_array {24} temp1 – temp24; array elements The array has 24 elements for the variables TEMP1 throughThese SAS variable lists enable to reference variables that TEMP24.have been previously defined in the same DATA step_NUMERIC_ indicates all numeric variables When the array elements are used within the data step the array name and the element number will reference them._CHARACTER_ indicates all character variables For example, the reference to the ninth element in the_ALL_ indicates both numeric and character variables temperature array is: temperature_array{9}RULES FOR ARRAY STATEMENTS: MULTI-DIMENSION ARRAYS:Some important rules to keep in mind when using arrays in If there is more than one dimension then it is a MultiSAS programs: Dimensional array. An array statement must contain either all numeric or all For Example, the array statement to define the character elements. i.e. mixed type variables are not allowed two-dimensional array will be: 03
  4. 4. Newslet ter Feb ruar y 2011 array ae_array {3, 12} aeterm1-aeterm12 Preferredterm1 – DO i=1 to 7;Preferredterm12 visit1 - visit12 ; c{i}=( f{i}-32 )*5/9; END;The array contains three sets of twelve elements. When the FORMAT c1-c7 4.1;array is defined the number of elements indicates the CARDS;number of rows (first dimension), and the number of aug 94 98 99 98 99 96 91 90 88 89columns (second dimension). sept 93 92 87 87 89 90 91 92 82 80 ;TEMPORARY ARRAYS:A temporary array is an array that exists only for the duration PROC PRINT;of the data step where it is defined. A temporary array is title1 DATA; FTOC2;useful for storing constant values, which are used in title2 Explicit Array Example;calculations. I n a temporar y ar ray there are no RUN;corresponding variables to identify the array elements. Theelements are defined by the key word _TEMPORARY_. SORTING ARRAYS: SORTQ can be used for character fields and SORTN can beExample: array systolicbp {6} _temporary_ (120 103 114 132 used to sort numeric variables. An example of sorting109 105); several numeric variables is as follows:EXPLICIT VS IMPLICIT SUBSCRIPTING: data _null_;Earlier versions of SAS originally defined arrays in a more array xarry{6} x1-x6;implicit manner as follows: set datasetname; call sortn(of x1-x6);array array-name<(index-variable)> <$> <length> run;array-elements <(initial-values)>; Following are some of the functions widely used in arrays.When an implicit array is defined, processing for everyelement in the array may be completed with a DO-OVER HBOUND FUNCTION:statement, an index variable may be indicated after the array This function returns the upper bound of the dimension ofname, For Example, an array. Example 1: One-dimensional Array*** Implicitly subscripted array; In this example, HBOUND returns the upper bound of theDATA ftoc; dimension, a value of 5. Therefore, SAS repeats the INPUT month $ f1-f7; statements in the DO loop five times. ARRAY f(i) f1-f7; array big{5} weight sex height state city; ARRAY c(i) c1-c7; do i=1 to hbound(big5); DO over f; more SAS statements.... c=(f-32)*5/9; end; END; FORMAT c1-c7 4.1; Example 2: Multidimensional Array CARDS; This example shows two ways of specifying the HBOUNDaug 94 98 99 98 99 96 91 90 88 89 function for multidimensional arrays. Both methods returnsept 93 92 87 87 89 90 91 92 82 80 the same value for HBOUND, as shown in the table that; follows the SAS code example.PROC PRINT; array mult{2:6,4:13,2} mult1-mult100;TITLE1 DATA: FTOC;TITLE2 Implicit Array Example; Syntax Alternative Syntax Value run; HBOUND (MULT) HBOUND (MULT, 1) 6TITLE; HBOUND2 (MULT) HBOUND (MULT, 2) 13RUN;; HBOUND3 (MULT) HBOUND (MULT, 3) 2This differs from the explicit array, previously discussedwhere a constant value or an asterisk, as the subscript, LBOUND Function:denotes the array bounds. For Example, This function returns the lower bound of the dimension of an array.*** Explicitly subscripted array;DATA ftoc2; Example 1: One-dimensional Array INPUT month $ f1-f7; In this example, LBOUND returns the lower bound of the ARRAY f{7} f1-f7; dimension, a value of 2. SAS repeats the statements in the ARRAY c{7} c1-c7; DO loop five times. 04
  5. 5. Newslet ter Feb ruar y 2011 array big{2:6} weight sex height state city; original data, each person has 3 observations. In the final do i=lbound(big) to hbound(big); version, each person should have just one observation. ...more SAS statements...; In the "before" scenario, the data are already sorted BY end; NAME DATE:Example 2: Multidimensional ArrayThis example shows two ways of specifying the LBOUND NAME DATE1,function for multidimensional arrays. Both methods return Amy Date #A1the same value for LBOUND, as shown in the table that Amy Date #A2follows the SAS code example. Amy Date #A3 array mult{2:6,4:13,2} mult1-mult100; Bob Date #B1 Bob Date #B2 Bob Date #B3 Syntax Alternative Syntax Value In the "after" scenario, the data will still be sorted by NAME: LBOUND (MULT) LBOUND (MULT, 1) 2 LBOUND2 (MULT) LBOUND (MULT, 2) 4 NAME DATE1 DATE2 DATE3 Amy Date #A1 Date #A2 Date #A3 LBOUND3 (MULT) LBOUND (MULT, 3) 2 Bob Date #B1 Date #B2 Date #B3DIM FUNCTION: The PROC TRANSPOSE program is as follows:This function returns the total number of elements in an PROC TRANSPOSE DATA=OLD OUT=NEWarray. PREFIX=DATE;Example 1: One-dimensional Array VAR DATE;In this example, DIM returns a value of 5. Therefore, SAS BY NAME;repeats the statements in the DO loop five times. array big{5} weight sex height state city; The PREFIX= option controls the names for the transposed do i=1 to dim(big); variables (DATE1, DATE2, etc.) Without it, the names of the more SAS statements; new variables would be COL1, COL2, etc. Actually, PROC end; TRANSPOSE creates an extra variable, _NAME_. _NAME_ has a value of DATE on both observations, indicating the nameExample 2: Multidimensional Array of the transposed variableThis example shows two ways of specifying the DIMfunction for multidimensional arrays. Both methods return The equivalent DATA step code using arrays could be:the same value for DIM, as shown in the table that followsthe SAS code example. DATA NEW (KEEP=NAME DATE1-DATE3); ARRAY DATES {3} DATE1-DATE3; array mult{5,10,2} mult1-mult100; DO I=1 TO 3; SET OLD; Syntax Alternative Syntax Value DATES{I} = DATE; END; DIM (MULT) DIM (MULT, 1) 5 However, the programmer could choose either proc DIM2 (MULT) DIM (MULT, 2) 10 transpose or arrays in the data step. DIM3 (MULT) DIM (MULT, 3) 2 Conclusion: Arrays play a vital role and is much efficient in SASArrays Vs Proc Transpose programming in clinical trial data management andTo transpose the data (turning variables into observations or statistical analysis. Since arrays reduce CPU time, costturning observations into variables), one can use either effective and reduces repetitive coding, it is a better choicePROC TRANSPOSE or array processing within a DATA step. for SAS programmers in their daily programming activitiesA Simple Transposition: ReferencesFor example, in a simple situation, where the program 1. The Little SAS Book. Lora D. Delwiche and Susan J Slaughtershould transpose observations into variables. In the 2. SAS Language reference from sas.com About MakroCare MakroCare is a global drug development services firm that operates through 4 main divisions - CRO, SMO, Informatics and Consulting. Integrated and innovative services in the areas of regulatory affairs, risk management, site management, patient recruitment, trial management (P II/III and late phase), biometrics, QA audits, PV/Safety, and informatics. www.makrocare.com 05

×