How the concordance index is calculated in Cox model if the actual event times are not predicted?

I am new to the field of survival analysis. I was reading about the interpretation of C-index and realized it only cares about the sequence of predictions. I was always using the sci-kit survival package and never deeply though how the C-index is calculated if the actual survival times are not predicted in Cox proportional hazard model. I would appreciate if someone simply explain this to me.

Cross Validated Asked on November 14, 2021

You are correct that time is not the default output of a Cox model. However, for any given unit with its covariate pattern, the model gives a relative hazard. By definition, units with higher hazard ratios should have shorter time to event. The censored c-index compares the estimated hazard ratio to both the actual event status and actual time to event (or censoring time) to produce its estimate.

Answered by Todd D on November 14, 2021

Below is my attempt to answer this question.

Concordance index is a measure of how discriminant your model is.
For survival analysis, say you have a covariate $$X$$ and a survival time $$T$$.
Assume that higher values of $$X$$ imply shorter value for $$T$$ (thus $$X$$ has a deleterious effect on $$T$$).
Discrimination means that you are able to say, with high reliability, that between two patients which one will have a shorter survival time.

For a perfectly discriminative model, if you pick two sujects at random $$(X_1,T_1)$$ and $$(X_2,T_2)$$ then the one with the largest value of $$X$$ will have, with probability $$1$$, a shorter survival time:

$$c=mathbb P( T_1 < T_2 mid X_1 geq X_2) = 1$$

In your dataset if you pick two patients at random, there is 4 cases:

1. $$X_1 geq X_2$$ and $$T_1 < T_2$$ : There is corcordance $$(C)$$
2. $$X_1 geq X_2$$ and $$T_1 > T_2$$ : Discordance $$(D)$$
3. $$X_1 = X_2$$ : Equal risks $$(R)$$
4. $$T_1 = T_2$$ : Equal times

The last case is not taken into account to estimate the concordance (at least I think so).

In case $$3$$, since the two patients have the same risk, the best you can do to say which one will have the shorter survival time is to toss a fair coin.

The estimated concordance index based on your data is:

$$hat c= frac{C+frac{R}{2}}{C+D+R}$$ where $$C$$, $$D$$ are the total number of concordant, discordant couples, $$R$$ the total number of couple with the exact same risk. The $$frac{R}{2}$$ at the numerator comes from the coin toss.

When there is censoring (as often with survival data) the computation of $$hat c$$ is modified but the idea and interpretation of $$c$$ remains the same.

Example

Say you have $$8$$ patients with data: $$begin{array}{c| c|c} text{Id} & text{Time} (T) & X \ hline 1 & 1 & 1 \ 2 & 2 & 3 \ 3 & 3 & 2 \ 4 & 12 & 10 \ 5 & 17 & 15 \ 6 & 27 & 40 \ 7 & 36 & 60 \ 8 & 55 & 80 end{array}$$

In that case, we see that larger values of $$X$$ imply larger values of $$T$$. Thus a couple is concordant if $$X_1 < X_2$$ and $$T_1 < T_2$$.

There are $$binom{8}{2}=28$$ choices of couples of patients, among those only the couple $$(2,3)$$ is discordant (since $$X_2 > X_3$$ but $$T_2 < T_3$$). There is no couple with equal risk thus $$R=0$$.

Then the estimated concordance index is $$frac{27}{28} approx 0.964$$.

You can check this with the R package survival (sorry I'm not used to survival analysis with Python):

require(survival)
time<-c(1,2,3,12,17,27,36,55)
X<-c(1,3,2,10,15,40,60,80)
data<-data.frame(matrix(c(time,X),ncol=2,8,byrow = F))
mod<-coxph(Surv(data[,1],rep(1,8))~data[,2])
mod\$concordance #~0.964


So to answer your question about predicted times, you can see that neither the values of $$T$$ or $$X$$ change the estimation of $$c$$: it's only a matter of ordering between predictor and survival times. You can change the value in the previous example without breaking the number of concordant/discordant couples and still have the same estimated concordance.

In which direction should I look for the covariate $$X$$?

Is a couple concordant if $$X_1 > X_2$$ and $$T_1 < T_2$$ or if $$X_1 < X_2$$ and $$T_1 < T_2$$?

For the Cox model, it depends on the estimated hazard-ratio. If the ratio, $$e^beta$$ is $$>1$$ then larger values of $$X$$ imply larger risks thus shorter times. So if $$e^beta > 1$$ a couple is concordant if $$X_1 > X_2$$ and $$T_1 < T_2$$, and if $$e^beta < 1$$ a couple is concordant if $$X_1 < X_2$$ and $$T_1 < T_2$$.

Finally in the case of a vector of covariates, I think the procedure remain the same but instead of using the vector $$X$$ we use the predicted risk $$hat beta X$$ with $$hat beta$$ estimated from the Cox model.

Answered by periwinkle on November 14, 2021

Related Questions

What statistical analysis to used for kinetic data with multiple groups?

1  Asked on August 5, 2020 by carlos-valenzuela

In R, why do the p-values from anova() change when you add more predictors?

0  Asked on August 4, 2020 by m-smith

Random forest after cross validation

1  Asked on August 1, 2020 by steven-niggebrugge

Grey relation between two datasets?

0  Asked on July 31, 2020 by msilvy

General procedures for combined feature selection, model tuning, and model selection?

1  Asked on July 31, 2020 by uared1776

Classification model not working for a large dataset

1  Asked on July 30, 2020 by gabriel-ullmann

Sigma algebra generated by random variable on a set with generators

0  Asked on July 28, 2020 by gabriel

What is the seasonal trend lowess model in time series?

0  Asked on July 28, 2020 by christopher-u

Non seasonal and seasonal parameters of this time-series

0  Asked on July 27, 2020 by statsmonkey

Extended Cox model and cox.zph

2  Asked on July 25, 2020 by finance