The metastatic setting has traditionally been the initial venue for the development of new drugs in breast cancer. This seems like a natural population in which to start because patients have often exhausted the armamentarium of approved agents and, therefore, clinical equipoise is easier to achieve. These trials, however, often involve sicker patients who can confound the results, and do not directly offer patients the chance of cure.1–4 Furthermore, these trials are often lengthy and expensive because they typically depend on survival endpoints in order to gain regulatory approval or to be considered for use in earlier disease stages. Adjuvant trials, which study therapies administered following surgical intervention, may offer patients an increased chance of cure, but they also generally depend on survival endpoints for analysis, which, in the case of estrogen receptor positive (ER+) breast cancer, can take years (or even decades) to complete.5
Neoadjuvant trials, in contrast, which study therapies administered prior to surgical intervention and commonly utilize the post-operative surrogate endpoint of pathologic complete response (pCR), offer a more efficient and potentially beneficial framework for drug development. In breast cancer, neoadjuvant trials have become an increasingly common setting in which to develop and gain approval for new drugs.1,6–8 Pertuzumab, for example, a monoclonal antibody used to treat human epidermal growth factor receptor 2 (HER2) positive (HER2+) breast cancer, was originally approved by the US Food and Drug Administration (FDA) in the metastatic setting based on the primary endpoint of overall survival (OS) in the CLEOPATRA study.9 However, FDA approval for the use of pertuzumab in the neoadjuvant setting was based on a primary endpoint of pCR in the NeoSphere study.6 This decision was somewhat controversial because the relationship between pCR and hard survival endpoints is not straightforward,10 and the definition of pCR can vary between studies.11 This paper will discuss the basis for using pCR as a surrogate endpoint in breast cancer, compare the validity of pCR amongst different immunohistochemistry-based receptor subtypes (or simply, receptor subtypes) of breast cancer, examine the relationship between pCR and survival, and evaluate whether or not pCR meets the criteria of a valid and useful surrogate endpoint.
Pathologic complete response and risk of breast cancer recurrence
In individual studies looking at all patients who achieved pCR after chemotherapy compared with those who did not, pCR seemed to consistently predict long-term survival. In 1999, Kuerer et al. first described tumour characteristics of 372 patients with locally advanced breast cancer who underwent neoadjuvant chemotherapy with fluorouracil, doxorubicin and cyclophosphamide.12 While only 16% of patients in the study achieved a pCR, those who did had significantly improved 5-year OS and disease-free survival (DFS) (89% and 87%, respectively) as compared with those without pCR (64% and 58%, respectively). Since Kuerer’s original publication, pCR has repeatedly predicted individual long-term survival in a large number of studies.13–30 Many of these trials included patients with all receptor subtypes of breast cancer, however, and subsequent subgroup analyses have clearly established that much of the relationship between pCR and long-term breast cancer survival relies heavily on the histologic subtype.
Differences in pathologic complete response rates among receptor subtypes of breast cancer
When comparing pCR among different receptor subtypes of breast cancer, there are two important questions to consider: how likely is neoadjuvant therapy to result in a pCR for that subtype, and how strongly is pCR correlated with event-free survival (EFS) in that subtype? To address the first issue of likelihood of achieving pCR, a pooled analysis by Kaufmann et al. of several neoadjuvant trials in 2006 showed that pCR is far less common in endocrine therapy responsive (i.e. ER+) tumours than in endocrine therapy unresponsive (i.e. ER-) tumours.13 Across six large trials13,20,23,25,31,32 (N range 250–2,411), the pCR rate ranged from 1.1% to 11.6% in the ER+ patients, compared with 15.4% to 42.2% in the ER- patients.13 The ER- population’s pCR range was similar to that of nine trials of HER2+ subjects with trastuzumab-containing regimens.13
These results were then confirmed by a larger pooled analysis from the CTNeoBC group in 2014.33 CTNeoBC was formed after the FDA recognized the prognostic potential of pCR and the promise of neoadjuvant trials in accelerating drug development, while at the same time acknowledging that more data were needed to firmly establish pCR as a reliable and valid surrogate endpoint. Pooling 12 large trials (N≥200 per trial), CTNeoBC analysed the tumour characteristics and survival data of 11,955 patients in a variety of breast cancer neoadjuvant trials. The three most commonly used definitions of pCR (Table 1) were also compared. ypT0/is ypN0 was the default definition of pCR for all analyses. Unsurprisingly, as the definition of pCR became stricter, incidence of pCR decreased (from 22% for ypT0/is to just 13% for ypT0 ypN0). Similar to the smaller analysis by Kaufmann et al., pCR rates (using the ypT0/is ypN0 definition) by receptor subtype followed previously described patterns (Table 2), with low-grade (1 or 2) ER+ tumours having the lowest rates of pCR (<10%), high-grade (3) ER+ tumours having higher rates of pCR (>15%), and triple-negative and ER-/HER2+ tumours having the highest rates of pCR (>30% and >50%, respectively).33 It should be noted that several studies have also evaluated the association of pCR rates with true molecular subtypes (i.e. luminal A/B, HER2-enriched and basal/triple-negative subtypes). HER2-enriched tumours have been shown to have an increased likelihood of pCR regardless of ER status, with an increased likelihood of pCR in ER- tumours.34 In the 2018 meta-analysis by Haque et al., which analysed pCR rates and association with these different subtypes, the demonstrated correlation was consistent with that seen in studies of immunohistochemistry-based receptor subtype.35
Association between pathologic complete response and survival
After reviewing the likelihood of pCR among the different receptor subtypes, the next issue to consider is the strength of association between pCR and survival in each of those subtypes. The CTNeoBC study reported Kaplan–Meier EFS curves and hazard ratios (HR) by each subtype, revealing a strength of association between pCR and EFS that followed a similar pattern to the incidence of pCR by subtype described above (Table 3). pCR in the triple-negative population had the most significant association with EFS (HR 0.24), and a similarly significant association was also seen in the HER2+ group (HR 0.39). While pCR in the ER+/HER2+ subgroup was also significantly associated with EFS (HR 0.58), the overall strength of association appeared to be driven largely by the more significant ER- subgroup (HR 0.25). While ER+/HER2- disease also had a significant association with EFS, this positive signal was primarily driven by the patients with high-grade disease, and the correlation for those individuals with lower-grade tumours was not statistically significant.33
CTNeoBC also compared the association between OS and pCR in each receptor subtype (Table 3). OS was most favourable in the pCR group compared with the non-pCR group in patients with ER-/HER2+ tumours who received trastuzumab (HR 0.08), followed by patients with triple-negative tumours (HR 0.16), ER-/HER2+ tumours without trastuzumab (HR 0.29) and high-grade ER+/HER2- tumours (HR 0.29). While the point estimates for OS were in favour of pCR for patients with low-grade ER+ and ER+/HER2+ tumours with or without trastuzumab, they did not reach statistical significance.33 The association between pCR and OS in the CTNeoBC study highlights the importance of targeted therapy in this surrogate, and may reveal the underlying reason for the relative lack of improvement in EFS for ER+ disease: the group with the strongest association between pCR and OS was the group that received the treatment most targeted to their tumour. While ER+ tumours certainly have targeted agents in the market, endocrine therapy was not a component of any of the treatment arms.
In 2020, Spring et al. performed a large meta-analysis of more than 50 trials conducted between 1999 and 2016, which represented 27,895 patients in total (range 2–11,955).36 Based on their analyses (which used Bayesian modelling to estimate HRs), patients who had pCR had significantly improved EFS compared with those patients without pCR (HR 0.31). Similar to CTNeoBC, their analyses demonstrated a stronger association between pCR and EFS in patients with triple-negative (HR 0.18) and HER2+ disease (HR 0.32), although HRs were larger than those in CTNeoBC. Improved EFS with pCR was also demonstrated in hormone receptor-positive disease, although the association only trended toward significance. The association between pCR and improved OS was again seen in triple-negative (HR 0.20) and HER2+ disease (HR 0.13). The improvement in survival for hormone receptor-positive disease, in contrast, was slight and with wide probability intervals.36
The most recent interim analysis of the I-SPY 2 Trial Consortium further demonstrated a significant relationship between pCR and EFS, though survival curves appeared to be more consistent across receptor subtypes. I-SPY 2 is a large, multicentre clinical trial using an adaptive randomization design to evaluate new drugs in the neoadjuvant setting, which uses pCR (defined as ypT0/is ypN0) as the primary outcome.7 Since 2010, almost 2,000 patients across eight receptor subtypes have been enrolled in the trial, and the evaluation of 15 investigational therapies has been completed. Investigators recently published data describing the relationship between pCR and 3-year outcomes (EFS and DFS) for the first 950 patients randomized.37 Overall, 34.7% of patients achieved pCR (Table 2). The groups with the highest rate of pCR were the ER-/HER2+ (68%), triple-negative (42%) and ER+/HER2+ (40%) groups receiving investigational therapy, with ER+/HER2- having the lowest rate of pCR (18%). Among patients receiving control chemotherapy, pCR rates were overall lower (19.3%), but followed a similar pattern when analysed by receptor subtype. pCR was most strongly associated with EFS (Table 3) in both hormone receptor-positive/HER2- and hormone receptor-negative/HER2+ disease (both HR 0.14), followed by hormone receptor-positive/HER2+ tumours (HR 0.15) and triple-negative tumours (HR 0.18).37 This study, however, did not enroll patients with low-risk (by gene assay) hormone receptor-positive/HER2- tumours or small tumours (<2.5 cm), which likely eliminated the diluting effect of low-grade tumours seen in the CTNeoBC study.
Using pathologic complete response to compare neoadjuvant regimens
Clearly, at a pooled patient-level analysis, pCR is strongly associated with EFS and OS, although the strength of association varies by receptor subtype and the use of targeted therapy. The issue, however, becomes much more complicated when performing a trial-level analysis, which asks if pCR can predict EFS and OS between treatment groups. CTNeoBC was the first analysis to compare pCR with EFS and OS amongst several large trials. The trial-level analysis was performed by quantifying each treatment’s effect on pCR and EFS/OS using a weighted linear regression (log scaled), and excluded non-randomized groups and groups that received additional chemotherapy in the adjuvant setting.33 The HRs for EFS and OS were plotted against the odds ratio for pCR in each trial. The result: there was no correlation at the trial level between pCR and EFS (R²=0.03) or OS (R²=0.24).33 This was true in both the overall analysis, as well as in each receptor subgroup.
A subsequent analysis by Berruti et al. in 2014 revealed similar findings.38 Twenty-nine trials (including all of the trials included in the CTNeoBC analysis), with a total of 14,641 patients, were included. Again, a weighted regression analysis was performed on log-transformed treatment effect estimates in order to test the association between treatment effect on pCR and treatment effect on EFS/OS. Again, only a weak association between the two effects was demonstrated: R2 value of the weighted regression line was 0.08 for DFS and 0.09 for OS, indicating that only 8% and 9% of variability among the impact on DFS and OS, respectively, was explained by the observed effect on pCR.38
These conflicting findings are a source of confusion and controversy in the academic world. The authors of the CTNeoBC and Berruti et al. analyses offer several reasons for this finding. First, most of the trials enrolled women with more than one subtype of breast cancer. Since the more common type of breast cancer (low-grade ER+) has a weaker and often insignificant correlation between pCR and survival, these patients may dilute and obscure the effect on the trial level. Second, the differences in drug administration and combination between the three trastuzumab-containing trials for HER2+ patients (including post-operative trastuzumab only in the TECHNO trial39) may have confounded any comparison between these groups. Finally, factors influencing EFS and OS that are unrelated to primary tumour response may have impacted the association.
To this end, only one meta-analysis to date has demonstrated a trial-level association between treatment effect on pCR and treatment effect on survival outcomes. In 2020, Huang et al. published analyses of eight large randomized controlled trials conducted between 2013 and 2018, which included 2,478 patients in total.40 Similar to prior analyses, weighted linear regressions were performed on log-transformed treatment effect estimates in order to test the association between treatment effect on pCR and treatment effect on EFS/OS. Although six of eight randomized controlled trials included patients with all receptor subtypes, analysis was only performed on data for patients with triple-negative disease. The results revealed that pCR was a significant predictor of EFS (R2 0.68) and OS (R2 0.24),40 although the association with OS was weaker. These results likely reflect the homogeneity of the patient population included in their analysis (triple-negative disease only) and the inclusion of more recent clinical trials that included newer therapies (such as bevacizumab, everolimus and nab-paclitaxel). These findings suggest that comparing trials with more homogeneous patient populations, especially those that incorporate targeted therapies, may be the setting in which a trial-level association between pCR and survival may be observed.
Association between pathologic complete response and survival among trials of targeted therapies
There are, however, still examples of conflicting findings among trials in more homogenous populations receiving targeted therapy. Examples are the ALTTO and NeoALTTO trials,41,42 which studied the use of an oral anti-HER2 tyrosine kinase inhibitor (lapatinib) in the adjuvant and neoadjuvant settings, respectively. NeoALTTO was reported 2 years before ALTTO, and showed a statistically significant improvement in pCR when lapatinib was added to trastuzumab (51.3% versus 29.5%, compared with trastuzumab alone, p=0.0001).42 Surprisingly, ALTTO subsequently showed that the addition of lapatinib (or lapatinib alone) was associated with a non-significant reduction in DFS compared with trastuzumab (HR 0.84, p=0.04).41 This difference was not significant due to a protocol amendment requiring a p value of <0.025, to account for multiple pairwise comparisons among treatment groups. There are several explanations for the observed differences between these studies. First, the underlying risk of the patient population in ALTTO appears to have been greater than that of NeoALTTO (lower DFS and more node-positive patients in ALTTO). ALTTO also included a greater percentage of concurrently ER+ tumours when compared with NeoALTTO (57% versus 50%, respectively), which are less sensitive to HER2 therapy and likely blunted the effect of lapatinib. ALTTO was also underpowered, reporting after just 555 events were reached, instead of the originally planned 850 events.
Ultimately, however, it is important to note that both trials found an improvement in DFS with the addition of lapatinib. The effects were in the same direction, but the underpowered ALTTO study showed a non-statistically significant improvement in DFS, while the adequately powered NeoALTTO study’s difference in DFS was significant. Statistical significance is not the same as truth, and is often bemoaned by biostatisticians and researchers, to the point where some journals prohibit some tests of statistical significance.43
Pertuzumab’s story is somewhat different. The neoadjuvant NeoSphere trial showed that the combination of docetaxel with trastuzumab + pertuzumab was associated with a pCR rate of 45.8%, which was significantly higher than the rate seen with trastuzumab and docetaxel (29%, p=0.0141).6 This improvement in pCR led to an FDA approval of pertuzumab in neoadjuvant breast cancer, and widespread adoption of using neoadjuvant pertuzumab in clinical practice. There was such enthusiasm for pertuzumab that the National Comprehensive Cancer Network (NCCN) included pertuzumab in adjuvant treatment guidelines, even though no study supporting the use of adjuvant pertuzumab had been reported or published.44
The adjuvant pertuzumab study, APHINITY,45 unlike ALTTO, was a positive trial, albeit with a far less drastic difference than NeoSphere would have suggested. Overall, patients who received the combination of trastuzumab + pertuzumab experienced a significant 1.6% absolute reduction in 3-year invasive-disease events over the trastuzumab + placebo group (HR 0.81). The effect was greatest in the patients with node-positive disease, with a 2.9% absolute reduction of invasive-disease events (HR 0.77).45 The more recently published results at 6 years of follow-up were consistent with this primary analysis (DFS HR 0.76, OS HR 0.85), although the interim OS analysis failed to reach the p value of 0.0012 required for statistical significance.46 The difference in the extent of benefit of pertuzumab between NeoSphere and APHINITY may again be explained by differences in trial design and study population. While APHINITY, unlike ALTTO, was adequately powered, it contained 64% ER+ patients, whereas NeoSphere only contained 47% ER+ patients, potentially diluting the effect of any HER2-targeted agents.
Using pathologic complete response versus residual disease status to guide future therapy
While a drug’s ability to affect a pCR may not perfectly predict its ability to improve DFS, two large trials within the last several years have clearly established pCR as a key criterion in guiding decisions about adjuvant therapy in triple-negative and HER2+ disease, the histologic subtypes where pCR is most common and most strongly associated with survival outcomes. The CREATE-X trial in 2017 randomized more than 900 patients with early-stage, HER2- breast cancer with residual disease after neoadjuvant chemotherapy (i.e. no pCR) to receive either capecitabine plus standard therapy or standard therapy alone.47 Their results showed improved OS (HR 0.59) and DFS (HR 0.70) in the capecitabine group compared with the control group, with subgroup analyses showing that this result was driven primarily by patients with triple-negative disease (DFS HR 0.58; OS HR 0.52).47 Since that time, adjuvant capecitabine has been the standard of care therapy for patients with triple-negative breast cancer with residual disease following neoadjuvant chemotherapy.
Similarly, the KATHERINE trial in 2019 randomized more than 1,000 patients with early-stage, HER2+ breast cancer with residual disease after neoadjuvant therapy (i.e. no pCR) to receive either trastuzumab emtansine (T-DM1) or standard therapy (trastuzumab alone).48 This study also showed higher rates of DFS (88.3% versus 77.0% at 3 years) and a significant reduction in the risk of recurrence (10.5% versus 15.9%) among patients who received T-DM1 compared with those who received trastuzumab alone.48 Since that time, T-DM1 has been the standard of care for patients with HER2+ breast cancer with residual disease following neoadjuvant chemotherapy.
The results of these studies illustrate that, on the trial level, while the presence of pCR may not consistently predict survival outcomes for different treatment arms, the absence of pCR may still help to identify those patients who would most benefit from additional therapy. This point was further illustrated in the trial by Lluch et al. in 2019 (the GEICAM/CIBOMA study), which randomized over 800 patients with triple-negative breast cancer who had previously undergone chemotherapy to receive either capecitabine or observation alone.49 The overall results showed no statistically significant difference in DFS (HR 0.82, p=0.136) or OS (fully adjusted HR 0.86, p=0.371). Differences in study population may partially account for this difference, as the control arm of the CREATE-X trial had a significantly higher risk of relapse than patients in the GEICAM/CIBOMA study, which some have suggested may be due to genetic/racial differences in capecitabine metabolism. This study, however, allowed for both adjuvant (79%) and neoadjuvant (20%) chemotherapy to have been given, and therefore, did not account for prior response to chemotherapy based on residual disease versus pCR. Therefore, patients who had an excellent response to chemotherapy, who likely derive no further benefit from additional chemotherapy, were included in this trial, which may account for the negative results.
Does pathologic complete response meet the criteria of a good surrogate endpoint?
Given the conflicting evidence regarding pCR described above, the validity of pCR in predicting long-term survival is controversial and remains the subject of debate.50 In determining the ability of a surrogate endpoint to predict clinical endpoints, there are many factors to consider. Bucher et al. proposed two main criteria in evaluating surrogate endpoints,51 which help to highlight the key issues complicating pCR as an endpoint in breast cancer. First, they asserted that the surrogate must be in the causal pathway between the drug or intervention and the clinical outcome of interest. In breast cancer, pCR appears to meet this criterion. pCR is a direct evaluation of biopsy-proven, localized or locally advanced malignancy following the administration of chemotherapy, after which, the only intervention that could have effected a change on that tumour is (presumably) the neoadjuvant regimen.
The second criterion proposed by Bucher et al. is the ability of the surrogate endpoint to wholly capture the effect of the intervention of interest. If an intervention affects a clinical outcome through multiple pathways, and a surrogate endpoint only represents the effects on one specific pathway in that disease, then it will be a less comprehensive, and therefore less valid, surrogate endpoint. Outside of oncology, this phenomenon is illustrated by the impact of fibrates on cholesterol. While fibrates lower cholesterol, a decrease in this commonly used surrogate endpoint in cardiac morbidity and mortality is actually associated with an increased rate of mortality.52 This speaks to the caution that one must take when evaluating a surrogate endpoint across classes of drugs, as fibrates have actions distinct from other cholesterol-lowering classes of drugs (such as statins) that are not represented by the surrogate of cholesterol (such as hepatic toxicity). In the case of neoadjuvant systemic chemotherapy for breast cancer, the surrogate is comprehensive in its evaluation of the mechanism. While other serum biomarkers, such as CA 27.29 or CA 15-3, are indirect representations of tumour activity,53 pCR is a culmination of the final common pathway that dictates whether or not a tumour will survive.
The ensuing question, however, remains: does pCR mean the same thing regardless of the receptor subtype of therapy received that brought about the pCR? As noted above, the ability of pCR to predict EFS is impacted by a tumour’s receptor status and, in some cases, the type of therapy received. This suggests that other factors besides pCR are impacting recurrence, especially in ER+ tumours. This is reflected in the differing recurrence patterns seen in each receptor subtype. While ER+/HER2- breast cancer generally portends a better prognosis than HER2+ or triple-negative breast cancers, ER+ breast cancer is most likely to recur during periods of longer follow-up.54 Mechanisms underlying this observed dormancy in ER+ breast cancer, including dormant tumour cells (DTCs) and re-activation pathways, are poorly understood but a topic of active research.55,56 The variance in association of pCR with clinical endpoints may be explained by these other pathways being differentially expressed within each receptor group, as they exert their action separately from the causal pathway of pCR and EFS.
Conclusion: Rationale for continued use of pathologic complete response in clinical trials
So, is pCR a valid surrogate endpoint, and should we continue to use it to study and approve new drugs in breast cancer? In the opinion of these authors (and the FDA): yes, but not always.10,57 As outlined above, it is clear that the relationship between pCR and important survival outcomes is obscured by ER positivity and should be most strongly considered in triple-negative and HER2+ populations. Careful attention should be paid to the percentage of ER+ patients when evaluating future neoadjuvant studies that contain HER2-targeted agents. But among triple-negative and HER2+ patients, pCR is clearly a frequent phenomenon that has been consistently and strongly correlated with important survival outcomes at the patient level. While pCR has not been shown to consistently correlate with these same survival outcomes when compared across studies, heterogenous patient populations, inconsistent methodologies, and a dearth of targeted trials in the largest definitive meta-analysis,33 potential explanations for this are lacking. Additionally, low event rates in these studies typically result in the trial being underpowered to detect this inter-trial difference, and meta-analyses like the CTNeoBC study should be regularly updated.58
As more neoadjuvant studies become available, especially in targeted populations, more meta-analyses may be able to show a trial-level association. Furthermore, adjuvant trials of targeted agents that have shown success in the neoadjuvant setting have repeatedly trended towards supporting pCR as a surrogate for survival outcomes, even when true statistical significance is not met. Finally, while the presence of pCR may not consistently predict improved survival outcomes on the trial level, the absence of pCR may still help to identify those patients who would most benefit from additional therapy. Ultimately, the combination of a strong, patient-level correlation and the prospect of expedited drug development makes pCR a reasonable and valid endpoint on which to base clinical and regulatory decisions, though caution should be exercised when applying findings based on pCR outside of their narrowly defined contexts.