A sequential test procedure for monitoring a singular safety and efficacy outcome

In this note we describe a modification of the sequential probability ratio test (SPRT) developed for the purpose of “flagging” a significant increase in the mortality rate of a treatment relative to a control while ensuring that double-blinding and the Type I error for the primary test of efficacy, also based on mortality rates, is not compromised.


Introduction
Drug trials go through different phases.In phase I trials the primary concern is safety, the subjects are typically healthy volunteer and patient studies, and the primary objective is to determine the maximum tolerated dose (MTD).This is followed by a phase II trial, which build upon the results of the phase I trial.
The primary goal of a phase II trial is to determine the optimal method of administration and examine potential efficacy.If the phase II trial demonstrates that the drug may be reasonably safe and potentially effective a phase III trial may be carried forth.The primary goal of a phase III trial is to compare the effectiveness of the new treatment with that of existing treatments or placebo.
In a majority of phase III clinical trials subject measurements can be divided into distinct efficacy and safety variables.
With the exception of truly sequential trials the primary efficacy variables are usually analyzed at a few well defined points in time (typically based upon calendar time or accrual milestones).In contrast, safety is monitored continuously through the generation of adverse event (AE) reports to the respective internal review board (IRB) and review by the principal investigator (PI), with summary data safety monitoring board (DSMB) reports generated at regular intervals, e.g.quarterly or bi-yearly.In a majority of high-risk clinical trials the DSMB will also monitor the flow of AE reports in real-time.
In the specific case of double-blind controlled trials of a new treatment versus standard of care, or placebo, how might one monitor safety "continuously" and efficacy at distinct points in time when the primary safety and efficacy variable is mortality.In addition, if the DSMB wishes to remain blinded to treatment assignment unless there is a true safety issue, what type of information is needed in order for them to make an informed decision in conjunction with real time monitoring of AE reports?To generate a formal statistical rule to tackle this problem requires the principal investigators to design the trial around the efficacy outcome in the traditional sense, while the DSMB needs to determine what are the unacceptable differences between the new therapy and control in the opposite (unsafe) direction.
In this note we develop a one-sided safety monitoring rule based upon a modification of Wald's classical sequential probability ratio test (SPRT)  .This trial was designed as a randomized, double-blind, placebocontrolled single-center trial.The trial was originally designed to enroll n=1500 subjects, but was terminated earlier due to financial reasons.For this specific trial of dichloroacetate (DCA) versus placebo the

Hutson, 2003 Safety and efficacy outcome
Trop J Pharm Res, December 2003; 2 (2) 199 measure of efficacy was 28-day mortality, and the primary measure of safety was also 28-day mortality.
Ultimately the DSMB wanted a simple statement after each block of subjects completed the trial: "remain blinded at this point in time" or that the "blind be broken at this point in time."Unblinding the study is used to mean group summary statistics will be analyzed and presented in an unblinded fashion.
This does not necessarily mean that data at the individual subject level is unblinded.Also note that the principal investigators would remain blinded even if the DSMB were to take an unblinded look at this trial.The use of this rule still allowed us to monitor secondary measures of safety in the standard way [3][4] .In addition, adverse events involving mortality were still monitored on a case-by-case basis (in a blinded manner).

Statistical Methods
We will employ a version of a sequential test for the purpose of generating a safety monitoring rule, which will basically flag a problem for the DSMB with respect to a disproportionate amount of mortalities for a new therapy relative to the standard of care or placebo.Note that this rule does not terminate the clinical trial, it only suggests unblinding the trial for the purpose of more intense scrutiny.The sequential test is based upon a modification of the SPRT, initially developed by Wald 1 , and is a procedure for testing a simple null hypothesis versus a simple alternative hypothesis continuously in time.
With respect to the new safety monitoring plan, "continuously in time" will refer to blocks of subjects who complete the trial as opposed to testing after individual subjects complete the trial.Therefore, in order to implement this rule effectively the design of the trial should be of the form of a randomized block design.Let N=n 1 +n 2 denote the total number of subjects for the new treatment plus the standard treatment, K denote the number of blocks, and For the purposes of our safety monitoring plan we set the null "efficacy" hypothesis to correspond to the original study design and the alternative "safety" hypothesis to correspond to unsafe rates of the new therapy relative to control.Let p 1 and p 2 denote the event rates of the new therapy and control group, respectively.Then the safety data monitoring rule(SDMR) consists of testing after blocks of N i = n 1i + n 2i subjects complete the study.The DSMB chairperson is then notified of the results after each test is carried out per block.
We strongly recommend that the values for p 1 and p 2 corresponding to H 0 be chosen based upon the original study design pertaining to efficacy.For example, in the DCA-MALA trial the mortality rates from which the trial was designed were p 1 = 0.19 and p 2 =0.25, for DCA and placebo, respectively.For H 1 , the "safety" hypothesis, the DSMB with the guidance of a simulation study deemed p 1 =0.28 to be an unacceptable death rate in the DCA arm given a placebo death rate of p 2 =0.25, and accounting for statistical noise.The mathematical details for the efficacy and safety "functions" are contained in Appendix A. Through simulations we illustrated that if the placebo death rate is lower than anticipated the decision to recommend unblinding the trial will be earlier given the same relative differences in adverse event rates, e.g.p 1 =0.18 to p 2 =0.15 for DCA relative to placebo.Note that the unblinding rule can be easily modified to accommodate other types of outcomes such as mean differences.Hence, we will typically choose to be much smaller than if the primary goal is safety monitoring.For the DCA-MALA trial we determined that the appropriate levels would be to set =0.20 and =10 -8 such that it would be unlikely that the test terminates earlier and we accept H 0 , yet the test would terminate quickly if there is a safety concern.Setting =0.20 and =10 -8 corresponds to the stopping bounds of A=9.9999 and B=0.00001.Therefore, if the test errs it will err on the conservative side in terms of unblinding the study early.The parameters and may be adjusted by the DSMB in order to relax or tighten the monitoring rule during the course of the study.
Using the candidate choices of =0.20 and =10 -8 (approximately null) in the DCA-MALA trial it was determined through simulation that i would never be less than B prior to the first planned interim analysis at 50% accrual.If this scenario did occur it would have indicated that DCA is substantially more effective than originally anticipated and that the safety of DCA in terms of the mortality rates shouldn't be of concern to the EAC at that current point in time.Therefore, we propose that if i <B during any point in the study that the SPRT decision rule be reset at block i-1, i.e. restart the safety monitoring rule one block back relative to the current point in time (block i) as the new "time 0." This provides a one block "burn-in" period to reset the rule.Even after resetting the test statistic due to the outstanding performance of the new therapy, a short run of deaths could occur favoring control such that reverses its path and crosses A. This rare event (given a longterm past history of treatment efficacy) would not stop the trial, however, the recommendation to unblind the trial at that point in time would be made to the DSMB Chair.It would then have to be determined whether this run was due to chance alone, or some deterministic cause such as a bad batch of drug.

Simulation Study
The following simulation study is used to illustrate the proportion of times out of 10,000 simulations that the decision to "remain blinded"or "unblind the study" would , 0.05, 0.10 for a clinical trial with sample size fixed at n=1500, broken into K=75 blocks.The decision to "reset the trial" the trial, as described above, is built-in to the simulation study.The numbers are similar to the operating characteristics of the DCA-MALA trial, however, there are no planned interim analyses.In addition, the median time that crosses A or B is given.For any specific trial of interest a similar simulation study should be undertaken in order to determine the appropriate parameter values for and .
In this specific simulation study the efficacy hypothesis 0 (p 1 ,p 2 ) was fixed at 0 (0.20, 0.25) as determined by a hypothetical trial design.Assume that the DSMB decided the safety hypothesis should be 1 (0.30, 0.25).The simulations were then carried out given different scenarios of "true" mortality rates (p 1 ,p 2 ).The pairs (p 1 ,p 2 ) were set to (0.15,0.25), (0.20,0.25), (0.25,0.25), (0.30,0.25), (0.20,0.15), and (0.15,0.10), corresponding to the new treatment being more efficacious than planned, the new treatment being efficacious as planned, the new treatment being equivalent to placebo, the new treatment being worse than placebo at the correct placebo rate, the new treatment being worse than placebo at a lower placebo mortality rate, and the new treatment being worse than placebo at a very low and unexpected placebo mortality rate, respectively.The simulation results are provided in Tables 1-9.Since we are primarily interested in testing for safety the choice of and comes down to a tradeoff between stopping times and Type I error for this application.The column labeled "Median Sample Size" indicates the median time at which a decision rule is to be implemented if the true underlying mortality rates were p 1 and p 2 .If we were confident of the underlying truth with respect to p 1 and p 2 in terms of efficacy then it becomes a question of trade-offs.What safety error rate is the DSMB willing to live with.For example from Table 1, if the DSMB chose =0.05 and =10 -7 then we would indicate to the DSMB that they would likely "unblind" the trial early 6.3% of the time over theoretical repetitions of the study, and "unblind" the trial early 97.6% of the time if there was a true safety concern.Note that if the new treatment was more successful than anticipated the "unblinding rate" goes down to 0.5%.
If the DSMB wanted a very stringent rule then they might go with =0.2 and =10 -7 , e.g.see Table 3.A possible tradeoff would be to choose =0.1 and =0.1 in Table 8, where we would unblind the study at rate 12.3% if we were close to the proportions from which the study was planned around, and unblind the study 98.9% of the time if we were close to the safety hypothesis.

DCA-MALA Trial Example
The following text appeared(modified slightly for this paper) in each DCA-MALA DSMB report following the adoption of the sequential test called the safety decision monitoring rule(SDMR) in the DCA-MALA trial."After every 20 subjects (one randomization blocking unit) have completed the study the biostatistics coordinating center will "update" the SDMR and recommend to the DSMB to either remain blinded to the treatment assignment or to unblind the treatment assignment due to a significant increase in the mortality rate, beyond random noise, of the DCA group relative to the placebo group.In addition to the SDMR, mortality data will always be monitored on case-by-case basis, i.e. if a sequential series of anomalous deaths occur in any given block the DSMB will be notified immediately, regardless of the SDMR".
The original trial was designed to enroll n=1500 subjects if accrual through two interim analysis reached 100%.Unfortunately, the trial was terminated early after only n=123 subjects had completed the study due to problems stemming from failing to meet accrual milestones set by the sponsors.
The low accrual rates were directly related to unusual dry spells occurring during the rainy seasons when malaria is prevalent.However, there was enough data gathered in order to demonstrate how the SDMR rule works in reality.
In order to illustrate the new SDMR to the DCA-MALA DSMB prior to their approval or disapproval, the following examples were presented to the committee members.The method was illustrated using various "madeup" outcomes, which were similar to what we anticipated might possibly occur during the DCA-MALA trial given H 0 : 0 = efficacy (p 1 = .19,p 2 = .25),H 1 : 1 = safety(p 1 = .28,p 2 = .25), The results are presented in Tables 10 -14.After every N i =10+10 subjects completed the study i was calculated along with the corresponding recommendation: "Remain Blinded" or "Unblind the Study."The example given in Table 14 is the only case where DCA mortality was consistently lower than placebo mortality and hence the decision was always to "Remain Blinded."In all other examples the decision to unblind the study was a function of the true underlying mortality rates.In our opinion these simple examples provided to the committee helped illustrate the utility of the SDMR and thus they ultimately endorsed its implementation.

Results for the DCA-MALA Trial
In this section we illustrate how the SPRT decision rule worked within the context of the DCA-MALA trial through 120 subjects given =0.20 and =10 -8 corresponding to boundaries of A=9.9999999 and B=0.0000111.The trial was terminated due to financial circumstances after n=123 subjects were enrolled.Hence, the final 3 subject's data were not included in the safety monitoring statistic illustrated here.

^1 p and ^2
p denote the estimated mortality rates in the DCA and placebo treatment groups for each block of 20 subjects.As can be seen the value of i started to "drift" toward A=9.9999999 as the imbalance in mortality rates favored DCA and then started to "drift" back towards B=0.0000111 as the mortality rates became more balanced.The simplicity of programming this method is illustrated via the SAS program used to carry out the calculations given in Appendix B.

Conclusion
In this note we presented a statistical decision rule for data safety monitoring purposes when the primary efficacy and primary safety endpoint of a clinical trial is mortality.This rule was designed for ease of interpretation by DSMB members with little or no formal statistical training.The goal of this method is to provide a means of controlling the approximate Type I error control for the efficacy analysis, while monitoring safety in a continuous fashion.As was discussed above, the method may be modified to accommodate more complex designs.Future work will involve studying the probability theory behind the utilization of different sequential bounds for efficacy and safety such this information can be incorporated into the sample size parameters during the design phase of the trial.

2
Hutson, 2003 Safety and efficacy outcome Trop J Pharm Res, December 2003; 2 (2) 200 Test Statistic.Denote the estimates of the proportions p 1 and p 2 as ^1i p = number of mortalities in the new therapy arm/n 1i ^2i p = number of mortalities in the control arm/n 2i , where i=1, 2,•••, K, and again K denotes the number of blocks.The safety monitoring test statistic i is then simply a function of ^1i p and ^2i p updated after every block i.For the DCA-MALA study we set B=75 and n 1i =n 2i =10 based upon efficacy considerations.Note that n 1i and n 2i have to be large enough to produce a meaningful value for i .The details for the calculation of i are contained in Appendix A. If i A then stop and reject H 0 (recommend unblinding the study).If i B then stop and accept H 0 (reset the monitoring rule), else continue to monitor the safety of the study.The consequences of resetting the monitoring rule are examined further in Section.Classical statistical theory dictates that A ≤ our purpose.This is demonstrated via the the simulation study in Section 3. The parameters and correspond roughly to traditional fixed sample size Type I and Type II error rates.

Table 1 :
Simulation Results

Table 2 :
Simulation Results

Table 3 :
Simulation Results

Table 5 :
Simulation Results