Dr. Bialkowski walks us through ways to improve Clinical Trials through metrics. This content was presented as part of an on-demand webinar which is available at bioclinica.com/resources/webinars.
How do we define study quality? After all, a study has many components and players.
Boil it down to essentials (CLICK): a quality study is one that gives you what you need, when you need it.
What you need: basically, a set of data whose sample size is sufficient to support statistical analysis. And how do you get that? Well, you need to have
Enough study sites, so that you can recruit
Enough subjects matching your particular inclusion criteria,
And provide them enough study drug over the course of the study
To generate a sufficient volume of data, AND you need
That data to be of high quality so that your conclusions are reliable.
That, in a nutshell, is what you need out a study. And, as I mentioned before, a quality study will get you those things when you need
Various deliverables or milestones come in when you’ve planned them to.
If means that you do all this within the constraints of your budget, which should have some assumptions built in for the imperfections of clinical trials—for example, your budget might include buffers for a certain percentage of percentage of screen failures, early terminations, lost or expired study drug, etc.
And, finally, it could mean that you need certain deliverables to fit the the schedules of shared resources, such as data managers or monitors who might be on multiple studies at one time.
If we think of study quality in these terms, then measuring study quality becomes a matter of asking a series of big questions: Do I have the subjects, the data, the documents, the study drug that I need? Is everything still on time, based on my original plans? And, am I staying within the constraints of my budget? (CLICK)
Answering questions can give you a read on the quality of your particular study: Is it humming along according to plan, is wavering just enough to be of concern, or do you have a full-scale emergency on your hands?
Answering those questions, and acting on them in a thoughtful manner, are all dependent upon the thoughtful use of metrics in the management of your study. Metrics provide sponsors a framework to understand and measure, at a glance, the various activities comprising the trial. They let you set expectations, and then assess site and study performance against those expectations.
The challenge, obviously, is knowing which metric to use and when to use it, particularly when eClinical systems—EDC, IVR/IWR, CTMS—contain so much data that can be leveraged for performance metrics.
When you try to consider all of the possibilities, it can become difficult to know which area or topic to focus on.
Even once you decide on a topic—enrollment, say, or data collection—you then have to decide on the type of metric or metrics to build and present for any one topic.
Then, quite possibly, you become this guy, paralyzed by the sheer number of choices.
I’m hoping today to provide some ideas and tactics that will help avoid this kind of analysis paralysis.
First, I think it’s important to understand the different types of metrics that you can develop.
Every study is different, and you’re going to look at certain topics, in different ways than you would other topics. A particular type of metric, however, is consistent no matter the subject matter. Understanding how each type is calculated, and where it can and cannot be applied, will help with some of your selection.
2) Second, I believe it’s essential to Identify your key drivers of study quality, and phrase each of them as a question. Then you find the metric that answers that question.
3) Third, I feel it’s important to map out the sequence in which you’ll ask these questions and use metrics to answer them. It’s impossible to weigh every metric at once, but more importantly you don’t need to: certain questions will be more urgent than others, depending on the audience and the time in the study. To these primary questions, other questions might emerge as followups. Figuring out these sequences will help you decide which metrics to focus on.
I believe if you follow through with these 3 steps, the metrics you need to manage your study quality, and the order in which you’ll consult and use them, will reveal itself.
Let’s start off with a quick refresher on what we’re talking about when we refer to metrics.
What is a metric? Well, in its most basic sense, it’s a measurement. But measurements don’t just happen. Instead, they’re actually the product of three different questions:
What do you want to measure?
How do you want to measure it?
Why do you want to measure it?
The three are interrelated. The what seems like it’s always there, but your understanding of it is determined by how you measure it, and your decision to do so is based in certain motivations.
If you start with a metric and work backwards, you’ll have an answer in search of a question. If, instead, you take the time to
I’ve highlighted the “How” because I’m going to try to describe some of your options here, the different types of metrics you can build.
There are many methods to choose from, many of different types of metrics, and each type has its own uses, its own strengths and weaknesses. A particular metric will answer certain questions, but not others. Understanding these uses and limitations will, I believe, help you later when you want to choose which metrics to use for your study.
Here I’ve listed out some of the most common metrics on the right side, moving from the most specific at the top down to the most summary, and, in fact, KPI’s don’t really “summarize” so much as render judgment, but we’ll come back to that in a bit.
On the left I’ve listed the levels on which metrics might focus in clinical development, again from the most specific level of the subject all the way to cross-study, program assessments. These don’t match up 1-to-1 with metric types on the right side, but they do illustrate how one progresses on this scale.
Add time as a dimension in these and you start to inch into trending and forecasting.
To illustrate this, I’m going to take you through an exercise using a simple set of sample data involving 10 sites, each of which has a target enrollment, and each of which enrolled subjects in the same time 15-month time period. Obviously in real life you’re not going to have such overlapping windows for different sites, but I’ve simplified things here for the purposes of illustration.
Now we can take a couple of different ways of visualizing that data.
The site on the left does this on a site-by-site basis and it’s not very helpful. I probably want to know it at some point, but it has too much noise and, just as importantly, no context: I don’t know how, any of these sites is doing. Site 1003, is significantly lower than the rest, but that’s not necessarily bad if that site is not expected to enroll as many subjects.
The chart on the right is a little more helpful. First, it simplifies things into the standard enrollment chart you know: blue columns show study enrollment by month, the red line shows the target. It tells you, at a glance what the numbers are and how to interpret them, i.e. how many subjects have I enrolled, and how am I doing?
But, its usefulness ends there, doesn’t it? After all, I can see that I’m below my enrollment target and have been since May of 2012, but I don’t know why. I don’t know whether all sites are under-enrolling, or only certain ones. You can already see then how not just the type of metric, but the level of focus—at a study level, at a site level, even across multiple studies—can impose some limits on the amount of information you can pack into a particular metric.
So let’s look at things another way. This time we’ll compare each site’s relative performance against target on a monthly basis. This normalizes things in such a way that and normalize it in such a way that regardless of whether we expect a site to enroll 60 subjects or enroll 15, we’re looking at them the same way.
The lefthand screenshot visualizes this data using sparklines, and shows each site’s deviation from target over time (the horizontal). The nice thing about these, as you can see, is that you give a visual indicator of each individual site’s performance above or below its horizontal target line within a single, easily digestible glance.
The chart on the right, combines all sites into a single view so I can see who is under- or over-performing throughout the study. This starts to tell an interesting story—I see a lot of fluctuation early in the study, as expected, but over time I should see scores start to converge on that 0% axis. They’re converging, but on the negative side of the axis. In this case, I’m not just dealing with one or two underperforming sites; many are underperforming, which begs the question, were my assumptions unrealistic?
Now, these have their drawbacks—the righthand chart might still be a bit noisy, and neither one lets me know actual targets. But, in working with our raw measurements in different ways, I’m starting to find different ways to measure performance and add context.
And, once we’ve done some summations and calculations with our data, we can turn them into a visual indicator (often also called a KPI), where we don’t even care about the score; we just want to know whether it’s red, yellow, or green. In this case, I’ve taken a site’s relative performance against target, and assigned thresholds above and below that target for scoring a site as red, yellow, or green.
This is the most easily digestible type of metric and the ideal for scorecards where you just want to know whether something is good or bad. The drawback of it, obviously, is that a KPI by itself has no information about the underlying data; in this case it tells us nothing about actual enrollment, nothing about a site’s target, nothing in fact about how far above or below target it is; only its relative score or risk.
If we return to our diagram from earlier, I’m hoping you have a sense of how, on a single topic, these different “types” of metrics are interconnected for a single topic: ex. Enrollment starts with individual records, which you aggregate, then you make some simple calculations off them (rate, average), then you target that and measure deviation off of it.
Also, one level of metric leads to another. Each of them is able to answer certain questions, but not others: for example, I have 25 subjects at site 82. Is that good or bad? Well, I have to look at their variance from target, which is another metric.
Once you understand how some of these different types of metrics function, you can build a toolkit, and pick and choose the most appropriate type of metric given the situation.
Having talked a bit about how you can measure things, let’s talk about what you can measure
As I mentioned earlier, there’s an abundance of things that you could track.
Here I’ve laid out a sampling of the types of information that can be pulled from your various eClinical systems: EDC, IVR/IWR, CTMS, safety. For any one of these, you can build a variety of different individual metrics of the types we just discussed. The MCC has 54 key clinical trial metrics.
The challenge, though, is deciding what areas to focus on, and then the level of detail that you need. I would argue that the best way to resolve this dilemma is not to think in terms of what data you could measure. Instead, ask yourself what questions or information needs you will have in managing your study.
More specifically, think about all the different people who have a part in making a study happen, and what key questions of there’s that metrics might be able to answer. Then ask yourself, for each of them:
Who is your audience?
What is the key question they need answered?
What will happen based on the answer?
Questions are specific. They’re for a specific audience, at a specific point in time, and the answer to them should direct you to some sort of outcome.
If you don’t understand your audience, you might hear back “That’s not what I need.”
If you don’t know exactly what you want to communicate, you might hear back “I don’t know what you’re trying to tell me.”
If you don’t anticipate the next step that follows the answer, you might hear back “What am I supposed to do with this?”
If, on the other hand, you understand you understand all three of these, then you’ll find it much easier to settle on a topic for your metric and the type of metric. Take the first one, for example: here, it doesn’t matter what actual enrollment numbers are, it does not even matter what the exact progress toward target of each site is; all the audience cares about is knowing whether any sites are below target, and how many.
I know what you’re thinking, though: there’s never just one question.
Take enrollment as an example again. The simplest question is, is my study enrolling enough subjects today?
It’s a simple yes/no question. If the answer is Yes, then end of discussion, right? Well, not necessarily.
You might start to ask additional questions to predict whether you’re actually trending toward future problems, whether it’s by lagging your target or by going overbudget due to an excess of screen failures or early terminations. You might probe to find out to what extent your potential problems cut across the whole study, and to what extent they are isolated by particular sites.
And, even if everything is going swimmingly, you may look for more subtle risk—areas where nothing is going wrong, but if something did go wrong, it could derail your study (ex. Of randomization putting a disproportionate # of subjects in one treatment arm at a particular site)
Now, having a lot of questions is not a bad thing, it does not make it any harder to ask questions. In fact, we actually have here is an emerging picture of how you can sequence your questions and the metrics that will answer them.
In the above example, certain questions above might become pressing if the answer to the first question is “Yes” because you want to start to look at potential future problems. If, on the other hand, your first answer is “No, I’m not enrolling enough subjects,” then you know you have a problem and are going to ask a lot of different questions to answer Why.
What do I mean by sequence your questions? Well, most key questions in study management are not standalones. They don’t exist in a vacuum. They’re important to specific people at specific point in the study, and based on the answer that a metric provides to them, additional questions might arise.
Anticipating these additional questions will help you order or sequence your metrics. You’ll find that certain questions come before others and certain metrics will therefore come before others, either because logically it makes sense, or because certain questions are more pressing than others.
Here I’ve tried to these sequences into major categories. They’re not exact, but they do, I think, illustrate the point.
Assess: First come the metrics that you use to provide quick assessments and convey some judgment. KPIs are the best example of this.
Analyze: Next come the metrics that help you to analyze and contextualize that initial assessment. For example, if your first metric told you something is wrong, the second level is where you start to figure out why exactly things are trending in the wrong direction. It’s possible you’ll have to look at more than one set of metrics to do this. Or, if everything looked good at that first level, you might come to this second level to identify certain outliers that might portend a future problem.
Action: Finally, on the right are metrics that complete your analysis by identifying the root cause of an issue and, in doing so, trigger decisive action.
It may not always take 3 steps to get you to action; sometimes it’s 1, sometimes 2, sometimes more. The important thing, however, is to start to plan the connections between the questions you are asking, the answers your metric may provide, and which questions or action those answers might lead to.
So how can we put this together for study metrics?
I’m going to start by looking at the needs based on timepoint in the study, and assume that, depending on whether it’s early or late in the study, some topics or concerns may be more pressing than others. To give a very simple example, data lock isn’t all that important early in the study. There are many questions you could ask, but at any particular point in your study, what are your most immediate ones? What keeps you up at night?
I believe that if you identify these questions, the metrics that will provide the answers you need, and the sequence of additional questions or actions that follow, you’ll develop an effective means of using metrics to improve and maintain your study quality.
Let’s look at the early days of a study as an example: you’ve started to open up sites and enroll subjects. I would say at this point data lock isn’t all that important. Dropout rates may not be that important because it’s early and you don’t have many subjects, but you might be interested in them if there are early enrollment problems.
I’ve offered on the left a handful of metrics that meet some of the most pressing questions:
Do I have the number of sites and subjects I’m supposed to have?
Data entry: are my sites already in the habit of entering data they’re supposed to? Are any of them generating more queries than others?
Have there been any SAEs?
That’s just the first level of questions you might ask. Sometimes one question is enough, and a bad answer will lead to action—in the above example, an SAE triggers an immediate call on the site. In other cases, the answers you get to these first questions may lead to a second set of questions.
(Click for animation)
At that point, we can identify the metrics that will provide answers to our next set of questions:
For example, if you don’t have enough sites yet, a rundown of site counts by status will tell you whether you have enough in your pipeline to get back on track. If not, you then might go another step down to identify why more sites aren’t coming online: are they behind schedule? Are they not collecting all of their documentation? Or do you just not have enough in the pipeline?
Similarly, if you’re behind schedule on enrollment, another layer of questions will get to the root cause: are you screen failing too many subjects? Are you losing subjects after randomization? Why? Or, are you not screening enough subjects at all?
If you aren’t screening enough subjects, site statuses will also give you a sense of whether you have the sites in your pipeline to make up the shortfall. Similarly, screen failures and early terminations will almost always be of interest, but particularly at this point to see if either can explain lagging enrollment numbers.
On the right, I’ve also indicated that, in consulting these metrics, it’s advisable to build in the capability to drill from study-wide to site-by-site details as needed. This will help you identify potential problem areas in a study that is on-average proceeding according to plan. It will also help you as you proceed through your sequences of metrics and try to identify root causes.
For example, let’s say I’m underenrolling at the study level. I can look and see I’m doing fine on site activation, but screen failures are too high. I can then drill into those screen failures more closely, see that 2 of my sites have disproportionately high screen-fail rates, and that they’re screening a lot of subject who don’t meet criteria. I know I need to refocus those sites.
Or, I don’t have a lot of screen failures…basically, everything is going according to plan except I’m not enrolling as many subjects as I thought I would. Therefore, I messed up my planning, and probably need to start to recruit more sites.
Sample dashboard. Emphasize the last few slides have talked through selecting and sequencing your questions and the metrics they need.
Translate that into action by building your dashboards.
Don’t go overboard—I suggest a small # of dashboards using the metrics of most importance.
Use judgment as to whether to include only metrics from that “Assess” level, or some of the followup “Analyze” metrics as well. Here, I have done that by showing site statuses and screen fail and early termination rates. In other cases you might want to simplify it even more.
If we move on to the midpoint in your study, you probably have the most moving parts. At this point I’m assuming that all sites are online, data entry is going smoothly with focus starting to shift to SDV and data lock. There should be an overall focus on continuing good enrollment trends, identifying supply-chain snags, and gathering quality data
So what are some of the metrics that might be more useful at this time? What will answer some of your first questions?
Enrollment: similar to early study. Making sure it stays on target, and if not, identify where the problem is via screen fails and early terminations.
Late action items: answer “has anything not been done, that was supposed to be done?
Supply chain: study drug inventory to show whether sites have enough drug.
Second level might start to answer why: are they getting shipments? Are we losing study drug?
Out of window subjects: similar to action items: should see nothing here; if you see anything, take immediate action.
Data management: now starting to see if data is being SDV and locked in a timely fashion, and what kind of trends we’re seeing in query volume.
Second level starts to drill down into reasons for anomalies in any of these.
Sample dashboard. Emphasize the last few slides have talked through selecting and sequencing your questions and the metrics they need.
Translate that into action by building your dashboards.
Don’t go overboard—I suggest a small # of dashboard using the metrics of most importance.
milestones: putting them more front and center because all of those late-study milestones are coming up—last visits, last form, data lock, site close out, etc.
CRF statuses: should show that last forms are received and counts of locked forms are creeping up until all are locked.
Maybe query metrics are a second level there—if it looks like forms aren’t being locked quickly enough, why not? Perhaps too many open queries?
I’ve only put two levels here because the concerns have gotten much narrower: closing out milestones, and locking the data.
Perhaps refine some of the milestones even further with conditional formatting—highlight in yellow or red given just how late something is.
We might also select and sequence our metrics by audience. For example, here are key metrics selected and ordered for a data manager.
Top level really has two things: % counts of forms received, SDV, and locked; and queries.
Depending on answers to those, they can choose to look at the next level of metrics with more or less urgency
Data entry: who is falling behind on entering forms.
CRF Status: if they’re not verified or locked, where else are they in beign processed (obviously dependent on your use of other status options in an EDC. If you don’t use it, this could be dropped)
Additional query metrics: drill in to details to understand
Whether certain forms are triggering more queries than others.
Whether a large number of queries are awaiting action.
If there is a pile of queries awaiting responses, the data manager has enough information to take action. But, he or she could also go one layer deeper at that point and investigate whether there are patterns in answering or accepting them more or less quickly.
Sample dashboard. Emphasize the last few slides have talked through selecting and sequencing your questions and the metrics they need.
Translate that into action by building your dashboards.
Don’t go overboard—I suggest a small # of dashboard using the metrics of most importance.
One parting bit of information:
There are stages to full implementation of a plan around metrics. We’ve focused today primarily on the Define and Value stages.
Those are just some examples of how you might develop strategies to use metrics at various points in your study. I’d like to emphasize that these are example—every sponsor is a bit different, every study is a bit different, but I’m hoping that some of the principles and strategies I’ve outlined for selecting and using metrics will help you.
I’d like to emphasize again that a metric is a tool. In and of itself it does nothing. You need to not only choose the right set of metrics and sequence your use of them, but also to follow through on that while managing your study.
Second, the use of metrics is an iterative process, one that will take continuous refinement to get the right set of metrics for your particular study, and you may make adjustments over the course of a single study, or as you start to experiment with program-wide assessments. There will be some trial and error to find the right set of questions, but that’s OK. Even the best plans take minor adjustments once they’re implemented.
Finally, walk before you run. It’s very easy to get overwhelmed, and to try to do everything at once. I’d encourage people instead to start small, to start with a focused set of metrics that answer your most targeted questions and build from there. Over time, you’ll develop a large toolkit of metrics, but it will be a structured set—you’ll know how they relate to one another, and how they translate into action.