---
title: "Survival analysis using R-INLA"
subtitle: "UNISA, South Africa, 2019"
author: "Janet van Niekerk"
output:
pdf_document: default
html_document: default
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# What is survival data?
![](survpic.png)
# Parametric survival models
First we will consider parametric survival models.
The parametric models available in *INLA* are *exponentialsurv, weibullsurv, loglogisticsurv, lognormalsurv, gamma.surv*.
## Survival function models
1. Exponential : $$t\sim Exp(\lambda),\lambda=\exp(\eta)$$
2. Weibull: $$t\sim Weibull(\alpha,\lambda),\lambda=\exp(\eta)$$
3. Loglogistic: $$\log(t)\sim Logistic(\alpha,\lambda),\lambda=\exp(\eta)$$
4. Lognormal: $$\log(t)\sim N(\eta,\tau)$$
In all of these, $$\eta=\sum_{j=1}^{n_\beta}\pmb{\beta}_j'\pmb{X}_j+\sum_{k=1}^{n_f}f^k(\pmb{u}_k).$$
Random components $f^k(\pmb{u}_k)$ of covariates $\pmb{u}_k$ can be splines, spatial components, frailties, clustering effects etc.
## Examples
### 1. Ovarian Cancer Survival Data
Survival in a randomised trial to compare two treatments for ovarian cancer.
__Covariates__:
futime: survival or censoring time
fustat: censoring status
age: in years
resid.ds: residual disease present (1=no,2=yes)
rx: treatment group
ecog.ps: ECOG performance status (1 is better, see Edmunson (1979))
```{r}
library(INLA)
library(survival)
ovarian_data<-ovarian
ovarian_data
```
Now we can examine the survival curve for each treatment. This can give us an indication of possible differences between treatments. The Kaplan-Meier curve is a nonparametric estimate of the survival curve and can be used as a visual summary of the survival times.
_For numerical stability it is advisable to rescale the time axis to the unit axis_.
```{r}
#Scale the time axis
mtime<-max(ovarian_data$futime)
mtime
ovarian_data$futime<-ovarian_data$futime/mtime
#Kaplan-meier curves for each treatment
ovarian_surv<-Surv(ovarian_data$futime,ovarian_data$fustat)
ovarian_surv
ovarian_km<-survfit(ovarian_surv~rx,data=ovarian_data)
plot(ovarian_km,col=c("red","blue"),lty=c(1,2))
legend(x=0.6,y=0.3,legend=c("Treatment 1", "Treatment 2"), col=c("red","blue"),lty=c(1,2))
```
Note that in this curve no effect of other covariates are taken into account so the apparent difference can be due to some other factors as well. We thus need to do a proper model and test for significant covariate effects. We can do this using `inla`.
For this we need to create a survival object in INLA, using `inla.surv`.
```{r}
ovarian_surv<-inla.surv(time=ovarian_data$futime,event=ovarian_data$fustat)
#View(ovarian_surv)
model_ovarian<-inla(formula=ovarian_surv~-1+as.factor(rx),family="exponentialsurv",
data=ovarian_data,verbose=TRUE)
summary(model_ovarian)
#Plot results
plot(ovarian_km,col=c("red","blue"),lty=c(2,2))
lines(seq(0.01,1,0.05),exp(-exp(model_ovarian$summary.fixed$mean[1])*seq(0.01,1,0.05)),
col="red")
lines(seq(0.01,1,0.05),exp(-exp(model_ovarian$summary.fixed$mean[2])*seq(0.01,1,0.05)),
col="blue")
legend(x=0.6,y=0.3,legend=c("Treatment 1", "Treatment 2"), col=c("red","blue"),lty=c(1,1))
```
Now let's include the other covariates.
```{r}
model_ovarian1<-inla(formula=ovarian_surv~-1+as.factor(rx)+age+as.factor(ecog.ps),
family="exponentialsurv",data=ovarian_data)
summary(model_ovarian1)
#Plot results
plot(ovarian_km,col=c("red","blue"),lty=c(2,2))
lines(seq(0.01,1,0.05),exp(-exp(model_ovarian1$summary.fixed$mean[1]+
model_ovarian1$summary.fixed$mean[3]*mean(ovarian_data$age))*
seq(0.01,1,0.05)),col="red")
lines(seq(0.01,1,0.05),exp(-exp(model_ovarian1$summary.fixed$mean[2]+
model_ovarian1$summary.fixed$mean[3]*mean(ovarian_data$age))*
seq(0.01,1,0.05)),col="blue")
legend(x=0.6,y=0.3,legend=c("Treatment 1", "Treatment 2"), col=c("red","blue"),lty=c(1,1))
```
So we can conclude that the treatment has a statistically significant effect.
For interpretation we should remember to scale the time axis to the original scale to find for instance the median survival time.
```{r}
plot(seq(0.01,2,0.05)*mtime,exp(-exp(model_ovarian1$summary.fixed$mean[1]+
model_ovarian1$summary.fixed$mean[3]*mean(ovarian_data$age))*
seq(0.01,2,0.05)),col="red",type="l",xlab="Time",ylab="Survival function")
lines(seq(0.01,2,0.05)*mtime,exp(-exp(model_ovarian1$summary.fixed$mean[2]+
model_ovarian1$summary.fixed$mean[3]*mean(ovarian_data$age))*
seq(0.01,2,0.05)),col="blue")
abline(a=0.5,b=0,col="green")
legend(x=1500,y=0.9,legend=c("Treatment 1", "Treatment 2"), col=c("red","blue"),lty=c(1,1))
```
# Semi-parametric model - Cox proportional hazards model
The Cox proportional hazards model is a regression model based on the assumption of proportional hazards accross subjects. The covariates are included in the model through a parametric linear predictor but the baseline hazard function (when all covariate values are $0$) is estimated nonparametrically.
The Cox model is implemented in `INLA` through the equivalence of the Cox model to a series of Poisson regression models \cite{martino2011}.
## Examples
### 1. Ovarian Cancer Survival Data (revisited)
We revisit the ovarian cancer example and do the analysis using a Cox proportional hazards model.
```{r}
model_ovarian2<-inla(formula=ovarian_surv~-1+as.factor(rx)+age+as.factor(ecog.ps),
family="coxph",data=ovarian_data)
summary(model_ovarian2)
#Plot baseline hazard
plot(model_ovarian2$summary.random$baseline.hazard$ID,
exp(model_ovarian2$summary.random$baseline.hazard$mean),
type="l",xlab="Time(rescaled)",ylab="Baseline hazard function")
#Plot results
plot(model_ovarian2$summary.random$baseline.hazard$ID,
exp(model_ovarian2$summary.random$baseline.hazard$mean+
rep(1,16)*(model_ovarian2$summary.fixed$mean[1]+
model_ovarian2$summary.fixed$mean[3]*mean(ovarian_data$age))),
col="red",type="l",xlab="Time(rescaled)",ylab="Hazard function",
ylim=c(0.4,1.5))
lines(model_ovarian2$summary.random$baseline.hazard$ID,
exp(model_ovarian2$summary.random$baseline.hazard$mean+
rep(1,16)*(model_ovarian2$summary.fixed$mean[2]+
model_ovarian2$summary.fixed$mean[3]*mean(ovarian_data$age))),
col="blue")
legend(x=0.1,y=1,legend=c("Treatment 1", "Treatment 2"), col=c("red","blue"),lty=c(1,1))
```
The (near) constant hazard functions imply that the exponential assumption (as in our previous example) was correct. The proportionality assumption should always be evaluated.
# Spatial survival models
A spatial model is just a "normal" model with a latent random effect over space. In the next example we illustrate how straightforward a spatial random effect can be incuded in the `inla` call. (More detail about spatial models on Thursday)
## Examples
### 2. Leukemkia survival data
A dataset on the survival of acute myeloid leukemia in 1,043 pateietns, first analyzed by Henderson et al. (2002). It is of interest to investigate possible spatial variation in survival after accounting for known subject-specific prognostic factors, which include age, sex, white blood cell count (wbc) at diagnosis, and the Townsend score (tpi) for which higher values indicates less affluent areas. Both exact residential locations of all patients and their administrative districts (24 districts that make up the whole region) are available.
```{r}
data("LeukSurv" , package="spBayesSurv")
d <- LeukSurv[order(LeukSurv$district), ]
d$time=d$time/max(d$time) #Very important
library(R2BayesX)
nwengland <- read.bnd(system.file("otherdata/nwengland.bnd", package = "spBayesSurv"))
adj.mat <- bnd2gra(nwengland)
E <- diag(diag(adj.mat)) - as.matrix(adj.mat) #Sharing boundaries matrix
d$U<-d$district
E
#convert to sparseMatrix
i_ind=NA
j_ind=NA
for (i in 1:nrow(E)){
for (j in 1:ncol(E)){
if (E[i,j]==1) {i_ind=c(i_ind,i)
j_ind=c(j_ind,j)}
}
}
g = sparseMatrix(i=i_ind[2:length(i_ind)],j=j_ind[2:length(j_ind)],dims=c(24,24))
g
#Alternatively,
#class(adj.mat) = NULL
#g = inla.read.graph(adj.mat)
####################Fit the spatial survival model using a Besag random effect
Y = inla.surv(time=d$time,event=d$cens)
model_leuk<-inla(formula=Y~age+as.factor(sex)+tpi+f(U,district,model="besag",
graph=g,scale.model=TRUE),data=d,family="exponentialsurv",
control.predictor = list(compute = TRUE), control.compute = list(dic=TRUE))
#Summary and extract predicted hazard per observation (NOT district)
summary(model_leuk)
hazard.fitted<-exp(model_leuk$summary.linear.predictor$`0.5quant`)
d$hazard.f<-hazard.fitted
#Plot the predicted effect of the district on the hazard
nwsp = bnd2sp(nwengland)
pcens = SpatialPolygonsDataFrame(nwsp,
aggregate(model_leuk$summary.random$U$mean,
list(d1=model_leuk$summary.random$U$ID),
function(v) {exp(v)}))
spplot(pcens,"x")
```
\bibliography{BioJ}