Title: | Copula Based Cox Proportional Hazards Models for Dependent Censoring |
---|---|
Description: | Copula based Cox proportional hazards models for survival data subject to dependent censoring. This approach does not assume that the parameter defining the copula is known. The dependency parameter is estimated with other finite model parameters by maximizing a Pseudo likelihood function. The cumulative hazard function is estimated via estimating equations derived based on martingale ideas. Available copula functions include Frank, Gumbel and Normal copulas. Only Weibull and lognormal models are allowed for the censoring model, even though any parametric model that satisfies certain identifiability conditions could be used. Implemented methods are described in the article "Copula based Cox proportional hazards models for dependent censoring" by Deresa and Van Keilegom (2024) <doi:10.1080/01621459.2022.2161387>. |
Authors: | Negera Wakgari Deresa [aut, cre]
|
Maintainer: | Negera Wakgari Deresa <[email protected]> |
License: | GPL-3 |
Version: | 0.1.3 |
Built: | 2025-02-22 04:40:11 UTC |
Source: | https://github.com/nago2020/semipar.depcens |
This function estimates the bootstrap standard errors for the finite-dimensional model parameters and for the non-parametric cumulative hazard function. Parallel computing using foreach has been used to speed up the estimation of standard errors.
boot.fun( init, resData, X, W, lhat, cumL, dist, k, lb, ub, Obs.time, cop, n.boot, n.iter, ncore, eps )
boot.fun( init, resData, X, W, lhat, cumL, dist, k, lb, ub, Obs.time, cop, n.boot, n.iter, ncore, eps )
init |
Initial values for the finite dimensional parameters obtained from the fit of |
resData |
Data matrix with three columns; Z = the observed survival time, d1 = the censoring indicator of T and d2 = the censoring indicator of C. |
X |
Data matrix with covariates related to T |
W |
Data matrix with covariates related to C. First column of W should be a vector of ones |
lhat |
Initial values for the hazard function obtained from the fit of |
cumL |
Initial values for the cumulative hazard function obtained from the fit of |
dist |
The distribution to be used for the dependent censoring time C. Only two distributions are allowed, i.e, Weibull
and lognormal distributions. With the value |
k |
Dimension of X |
lb |
lower boundary for finite dimensional parameters |
ub |
Upper boundary for finite dimensional parameters |
Obs.time |
Observed survival time, which is the minimum of T, C and A, where A is the administrative censoring time. |
cop |
Which copula should be computed to account for dependency between T and C. This argument can take
one of the values from |
n.boot |
Number of bootstraps to use in the estimation of bootstrap standard errors. |
n.iter |
Number of iterations; the default is |
ncore |
The number of cores to use for parallel computation is configurable, with the default |
eps |
Convergence error. This is set by the user in such away that the desired convergence is met; the default is |
Bootstrap standard errors for parameter estimates and for estimated cumulative hazard function.
This function estimates the bootstrap standard errors for the finite-dimensional model parameters and for the non-parametric cumulative hazard function under the assumption of independent censoring. Parallel computing using foreach has been used to speed up the computation.
boot.funI( init, resData, X, W, lhat, cumL, dist, k, lb, ub, Obs.time, n.boot, n.iter, ncore, eps )
boot.funI( init, resData, X, W, lhat, cumL, dist, k, lb, ub, Obs.time, n.boot, n.iter, ncore, eps )
init |
Initial values for the finite dimensional parameters obtained from the fit of |
resData |
Data matrix with three columns; Z = the observed survival time, d1 = the censoring indicator of T and d2 = the censoring indicator of C. |
X |
Data matrix with covariates related to T |
W |
Data matrix with covariates related to C. First column of W should be a vector of ones |
lhat |
Initial values for the hazard function obtained from the fit of |
cumL |
Initial values for the cumulative hazard function obtained from the fit of |
dist |
The distribution to be used for the dependent censoring time C. Only two distributions are allowed, i.e, Weibull
and lognormal distributions. With the value |
k |
Dimension of X |
lb |
lower boundary for finite dimensional parameters |
ub |
Upper boundary for finite dimensional parameters |
Obs.time |
Observed survival time, which is the minimum of T, C and A, where A is the administrative censoring time. |
n.boot |
Number of bootstraps to use in the estimation of bootstrap standard errors. |
n.iter |
Number of iterations; the default is |
ncore |
The number of cores to use for parallel computation is configurable, with the default |
eps |
Convergence error. This is set by the user in such away that the desired convergence is met; the default is |
Bootstrap standard errors for parameter estimates and for estimated cumulative hazard function.
This function estimates phi function at fixed time point t
CompC(theta, t, X, W, ld, cop, dist)
CompC(theta, t, X, W, ld, cop, dist)
theta |
Estimated parameter values/initial values for finite dimensional parameters |
t |
A fixed time point |
X |
Data matrix with covariates related to T |
W |
Data matrix with covariates related to C. First column of W should be ones |
ld |
Output of |
cop |
Which copula should be computed to account for dependency between T and C. This argument can take
one of the values from |
dist |
The distribution to be used for the dependent censoring C. Only two distributions are allowed, i.e, Weibull
and lognormal distributions. With the value |
This function computes distance between two vectors based on L2-norm
Distance(a, b)
Distance(a, b)
a |
First vector |
b |
Second vector |
This function allows to estimate the dependency parameter along all other model parameters. First, estimates the cumulative hazard function, and then at the second stage it estimates other model parameters assuming that the cumulative hazard function is known. The details for implementing the dependent censoring methodology can be found in Deresa and Van Keilegom (2024).
fitDepCens( resData, X, W, cop = c("Frank", "Gumbel", "Normal"), dist = c("Weibull", "lognormal"), start = NULL, n.iter = 50, bootstrap = TRUE, n.boot = 150, ncore = 7, eps = 1e-04 )
fitDepCens( resData, X, W, cop = c("Frank", "Gumbel", "Normal"), dist = c("Weibull", "lognormal"), start = NULL, n.iter = 50, bootstrap = TRUE, n.boot = 150, ncore = 7, eps = 1e-04 )
resData |
Data matrix with three columns; Z = the observed survival time, d1 = the censoring indicator of T and d2 = the censoring indicator of C. |
X |
Data matrix with covariates related to T. |
W |
Data matrix with covariates related to C. First column of W should be a vector of ones. |
cop |
Which copula should be computed to account for dependency between T and C. This argument can take
one of the values from |
dist |
The distribution to be used for the censoring time C. Only two distributions are allowed, i.e, Weibull
and lognormal distributions. With the value |
start |
Initial values for the finite dimensional parameters. If |
n.iter |
Number of iterations; the default is |
bootstrap |
A boolean indicating whether to compute bootstrap standard errors for making inferences. |
n.boot |
Number of bootstrap samples to use in the estimation of bootstrap standard errors if |
ncore |
The number of cores to use for parallel computation in bootstrapping, with the default |
eps |
Convergence error. This is set by the user in such away that the desired convergence is met; the default is |
This function returns a fit of dependent censoring model; parameter estimates, estimate of the cumulative hazard function, bootstrap standard errors for finite-dimensional parameters, the nonparametric cumulative hazard function, etc.
Deresa and Van Keilegom (2024). Copula based Cox proportional hazards models for dependent censoring, Journal of the American Statistical Association, 119:1044-1054.
# Toy data example to illustrate implementation n = 300 beta = c(0.5) lambd = 0.35 eta = c(0.9,0.4) X = cbind(rbinom(n,1,0.5)) W = cbind(rep(1,n),rbinom(n,1,0.5)) # generate dependency structure from Frank frank.cop <- copula::frankCopula(param = 5,dim = 2) U = copula::rCopula(n,frank.cop) T1 = (-log(1-U[,1]))/(lambd*exp(X*beta)) # Survival time T2 = (-log(1-U[,2]))^(1.1)*exp(W%*%eta) # Censoring time A = runif(n,0,15) # administrative censoring time Z = pmin(T1,T2,A) d1 = as.numeric(Z==T1) d2 = as.numeric(Z==T2) resData = data.frame("Z" = Z,"d1" = d1, "d2" = d2) # should be data frame colnames(W) <- c("ones","cov1") colnames(X) <- "cov.surv" # Fit dependent censoring model fit <- fitDepCens(resData = resData, X = X, W = W, bootstrap = FALSE) # parameter estimates fit$parameterEstimates # summary fit results summary(fit) # plot cumulative hazard vs time plot(fit$observedTime, fit$cumhazardFunction, type = "l",xlab = "Time", ylab = "Estimated cumulative hazard function")
# Toy data example to illustrate implementation n = 300 beta = c(0.5) lambd = 0.35 eta = c(0.9,0.4) X = cbind(rbinom(n,1,0.5)) W = cbind(rep(1,n),rbinom(n,1,0.5)) # generate dependency structure from Frank frank.cop <- copula::frankCopula(param = 5,dim = 2) U = copula::rCopula(n,frank.cop) T1 = (-log(1-U[,1]))/(lambd*exp(X*beta)) # Survival time T2 = (-log(1-U[,2]))^(1.1)*exp(W%*%eta) # Censoring time A = runif(n,0,15) # administrative censoring time Z = pmin(T1,T2,A) d1 = as.numeric(Z==T1) d2 = as.numeric(Z==T2) resData = data.frame("Z" = Z,"d1" = d1, "d2" = d2) # should be data frame colnames(W) <- c("ones","cov1") colnames(X) <- "cov.surv" # Fit dependent censoring model fit <- fitDepCens(resData = resData, X = X, W = W, bootstrap = FALSE) # parameter estimates fit$parameterEstimates # summary fit results summary(fit) # plot cumulative hazard vs time plot(fit$observedTime, fit$cumhazardFunction, type = "l",xlab = "Time", ylab = "Estimated cumulative hazard function")
This function allows to estimate all model parameters under the assumption of independent censoring. First, estimates the cumulative hazard function, and then at the second stage it estimates other model parameters assuming that the cumulative hazard is known.
fitIndepCens( resData, X, W, dist = c("Weibull", "lognormal"), start = NULL, n.iter = 50, bootstrap = TRUE, n.boot = 150, ncore = 7, eps = 1e-04 )
fitIndepCens( resData, X, W, dist = c("Weibull", "lognormal"), start = NULL, n.iter = 50, bootstrap = TRUE, n.boot = 150, ncore = 7, eps = 1e-04 )
resData |
Data matrix with three columns; Z = the observed survival time, d1 = the censoring indicator of T and d2 = the censoring indicator of C. |
X |
Data matrix with covariates related to T. |
W |
Data matrix with covariates related to C. First column of W should be ones. |
dist |
The distribution to be used for the censoring time C. Only two distributions are allowed, i.e, Weibull
and lognormal distributions. With the value |
start |
Initial values for the finite dimensional parameters. If |
n.iter |
Number of iterations; the default is |
bootstrap |
A boolean indicating whether to compute bootstrap standard errors for making inferences. |
n.boot |
Number of bootstrap samples to use in the estimation of bootstrap standard errors if |
ncore |
The number of cores to use for parallel computation is configurable, with the default |
eps |
Convergence error. This is set by the user in such away that the desired convergence is met; the default is |
This function returns a fit of independent censoring model; parameter estimates, estimate of the cumulative hazard function, bootstrap standard errors for finite-dimensional parameters, the nonparametric cumulative hazard function, etc.
# Toy data example to illustrate implementation n = 300 beta = c(0.5) lambd = 0.35 eta = c(0.9,0.4) X = cbind(rbinom(n,1,0.5)) W = cbind(rep(1,n),rbinom(n,1,0.5)) # generate dependency structure from Frank frank.cop <- copula::frankCopula(param = 5,dim = 2) U = copula::rCopula(n,frank.cop) T1 = (-log(1-U[,1]))/(lambd*exp(X*beta)) # Survival time' T2 = (-log(1-U[,2]))^(1.1)*exp(W%*%eta) # Censoring time A = runif(n,0,15) # administrative censoring time Z = pmin(T1,T2,A) d1 = as.numeric(Z==T1) d2 = as.numeric(Z==T2) resData = data.frame("Z" = Z,"d1" = d1, "d2" = d2) # should be data frame colnames(W) <- c("ones","cov1") colnames(X) <- "cov.surv" # Fit independent censoring model fitI <- fitIndepCens(resData = resData, X = X, W = W, bootstrap = FALSE) # parameter estimates fitI$parameterEstimates # summary fit results summary(fitI) # plot cumulative hazard vs time plot(fitI$observedTime, fitI$cumhazardFunction, type = "l",xlab = "Time", ylab = "Estimated cumulative hazard function")
# Toy data example to illustrate implementation n = 300 beta = c(0.5) lambd = 0.35 eta = c(0.9,0.4) X = cbind(rbinom(n,1,0.5)) W = cbind(rep(1,n),rbinom(n,1,0.5)) # generate dependency structure from Frank frank.cop <- copula::frankCopula(param = 5,dim = 2) U = copula::rCopula(n,frank.cop) T1 = (-log(1-U[,1]))/(lambd*exp(X*beta)) # Survival time' T2 = (-log(1-U[,2]))^(1.1)*exp(W%*%eta) # Censoring time A = runif(n,0,15) # administrative censoring time Z = pmin(T1,T2,A) d1 = as.numeric(Z==T1) d2 = as.numeric(Z==T2) resData = data.frame("Z" = Z,"d1" = d1, "d2" = d2) # should be data frame colnames(W) <- c("ones","cov1") colnames(X) <- "cov.surv" # Fit independent censoring model fitI <- fitIndepCens(resData = resData, X = X, W = W, bootstrap = FALSE) # parameter estimates fitI$parameterEstimates # summary fit results summary(fitI) # plot cumulative hazard vs time plot(fitI$observedTime, fitI$cumhazardFunction, type = "l",xlab = "Time", ylab = "Estimated cumulative hazard function")
Competing risk data set involving follicular cell lymphoma as given in the book of Pintilie (2006). The data consist of 541 patients with early disease stage (I or II) and treated with radiation alone (RT) or with radiation and chemotherapy (CMT). The endpoints of interest are what comes first: relapse of the disease or death in remission.
A data frame with 541 rows and 7 variables:
age: age
clinstg: clinical stage: 1=stage I, 2=stage II
ch: chemotherapy
rt: radiotherapy
hgb: hemoglobin level in g/l
time: first failure time
status: censoring status; 0=censored, 1=relapse, 2=death
Pintilie, M. (2006), Competing Risks: A Practical Perspective, Chichester: Wiley.
Simulate a toy example for testing
genData(n)
genData(n)
n |
sample size |
Loglikehood function under independent censoring
LikCopInd(theta, resData, X, W, lhat, cumL, dist)
LikCopInd(theta, resData, X, W, lhat, cumL, dist)
theta |
Estimated parameter values/initial values for finite dimensional parameters |
resData |
Data matrix with three columns; Z = the observed survival time, d1 = the censoring indicator of T and d2 = the censoring indicator of C. |
X |
Data matrix with covariates related to T |
W |
Data matrix with covariates related to C. First column of W should be ones |
lhat |
The estimated hazard function obtained from the output of |
cumL |
The estimated cumulative hazard function from the output of |
dist |
The distribution to be used for the dependent censoring C. Only two distributions are allowed, i.e, Weibull
and lognormal distributions. With the value |
Maximized log-likelihood value
Change hazard and cumulative hazard to long format
Longfun(Z, T1, lhat, Lhat)
Longfun(Z, T1, lhat, Lhat)
Z |
Observed survival time, which is the minimum of T, C and A, where A is the administrative censoring time. |
T1 |
Distinct observed survival time |
lhat |
Hazard function estimate |
Lhat |
Cumulative hazard function estimate |
The PseudoL
function is maximized in order to
estimate the finite dimensional model parameters, including the dependency parameter.
This function assumes that the cumulative hazard function is known.
PseudoL(theta, resData, X, W, lhat, cumL, cop, dist)
PseudoL(theta, resData, X, W, lhat, cumL, cop, dist)
theta |
Estimated parameter values/initial values for finite dimensional parameters |
resData |
Data matrix with three columns; Z = the observed survival time, d1 = the censoring indicator of T and d2 = the censoring indicator of C. |
X |
Data matrix with covariates related to T |
W |
Data matrix with covariates related to C. First column of W should be ones |
lhat |
The estimated hazard function obtained from the output of |
cumL |
The estimated cumulative hazard function from the output of |
cop |
Which copula should be computed to account for dependency between T and C. This argument can take
one of the values from |
dist |
The distribution to be used for the dependent censoring C. Only two distributions are allowed, i.e, Weibull
and lognormal distributions. With the value |
maximized log-likelihood value
Function to indicate position of t in observed survival time
SearchIndicate(t, T1)
SearchIndicate(t, T1)
t |
fixed time t |
T1 |
distinct observed survival time |
This function estimates the cumulative hazard function of survival time (T) under dependent censoring (C). The estimation makes use of the estimating equations derived based on martingale ideas.
SolveL( theta, resData, X, W, cop = c("Frank", "Gumbel", "Normal"), dist = c("Weibull", "lognormal") )
SolveL( theta, resData, X, W, cop = c("Frank", "Gumbel", "Normal"), dist = c("Weibull", "lognormal") )
theta |
Estimated parameter values/initial values for finite dimensional parameters |
resData |
Data matrix with three columns; Z = the observed survival time, d1 = the censoring indicator of T and d2 = the censoring indicator of C. |
X |
Data matrix with covariates related to T |
W |
Data matrix with covariates related to C. First column of W should be ones |
cop |
Which copula should be computed to account for dependency between T and C. This argument can take
one of the values from |
dist |
The distribution to be used for the dependent censoring C. Only two distributions are allowed, i.e, Weibull
and lognormal distributions. With the value |
This function returns an estimated hazard function, cumulative hazard function and distinct observed survival times;
n = 200 beta = c(0.5) lambd = 0.35 eta = c(0.9,0.4) X = cbind(rbinom(n,1,0.5)) W = cbind(rep(1,n),rbinom(n,1,0.5)) frank.cop <- copula::frankCopula(param = 5,dim = 2) U = copula::rCopula(n,frank.cop) T1 = (-log(1-U[,1]))/(lambd*exp(X*beta)) # Survival time' T2 = (-log(1-U[,2]))^(1.1)*exp(W%*%eta) # Censoring time A = runif(n,0,15) # administrative censoring time Z = pmin(T1,T2,A) d1 = as.numeric(Z==T1) d2 = as.numeric(Z==T2) resData = data.frame("Z" = Z,"d1" = d1, "d2" = d2) theta = c(0.3,1,0.3,1,2) # Estimate cumulative hazard function cumFit <- SolveL(theta, resData,X,W) cumhaz = cumFit$cumhaz time = cumFit$times # plot hazard vs time plot(time, cumhaz, type = "l",xlab = "Time", ylab = "Estimated cumulative hazard function")
n = 200 beta = c(0.5) lambd = 0.35 eta = c(0.9,0.4) X = cbind(rbinom(n,1,0.5)) W = cbind(rep(1,n),rbinom(n,1,0.5)) frank.cop <- copula::frankCopula(param = 5,dim = 2) U = copula::rCopula(n,frank.cop) T1 = (-log(1-U[,1]))/(lambd*exp(X*beta)) # Survival time' T2 = (-log(1-U[,2]))^(1.1)*exp(W%*%eta) # Censoring time A = runif(n,0,15) # administrative censoring time Z = pmin(T1,T2,A) d1 = as.numeric(Z==T1) d2 = as.numeric(Z==T2) resData = data.frame("Z" = Z,"d1" = d1, "d2" = d2) theta = c(0.3,1,0.3,1,2) # Estimate cumulative hazard function cumFit <- SolveL(theta, resData,X,W) cumhaz = cumFit$cumhaz time = cumFit$times # plot hazard vs time plot(time, cumhaz, type = "l",xlab = "Time", ylab = "Estimated cumulative hazard function")
This function estimates the cumulative hazard function of survival time (T) under the assumption of independent censoring. The estimating equation is derived based on martingale ideas.
SolveLI(theta, resData, X)
SolveLI(theta, resData, X)
theta |
Estimated parameter values/initial values for finite dimensional parameters |
resData |
Data matrix with three columns; Z = the observed survival time, d1 = the censoring indicator of T and d2 = the censoring indicator of C. |
X |
Data matrix with covariates related to T |
This function returns an estimated hazard function, cumulative hazard function and distinct observed survival times;
n = 200 beta = c(0.5) lambd = 0.35 eta = c(0.9,0.4) X = cbind(rbinom(n,1,0.5)) W = cbind(rep(1,n),rbinom(n,1,0.5)) frank.cop <- copula::frankCopula(param = 5,dim = 2) U = copula::rCopula(n,frank.cop) T1 = (-log(1-U[,1]))/(lambd*exp(X*beta)) # Survival time' T2 = (-log(1-U[,2]))^(1.1)*exp(W%*%eta) # Censoring time A = runif(n,0,15) # administrative censoring time Z = pmin(T1,T2,A) d1 = as.numeric(Z==T1) d2 = as.numeric(Z==T2) resData = data.frame("Z" = Z,"d1" = d1, "d2" = d2) theta = c(0.3,1,0.3,1) # Estimate cumulative hazard function cumFit_ind <- SolveLI(theta, resData,X) cumhaz = cumFit_ind$cumhaz time = cumFit_ind$times # plot hazard vs time plot(time, cumhaz, type = "l",xlab = "Time", ylab = "Estimated cumulative hazard function")
n = 200 beta = c(0.5) lambd = 0.35 eta = c(0.9,0.4) X = cbind(rbinom(n,1,0.5)) W = cbind(rep(1,n),rbinom(n,1,0.5)) frank.cop <- copula::frankCopula(param = 5,dim = 2) U = copula::rCopula(n,frank.cop) T1 = (-log(1-U[,1]))/(lambd*exp(X*beta)) # Survival time' T2 = (-log(1-U[,2]))^(1.1)*exp(W%*%eta) # Censoring time A = runif(n,0,15) # administrative censoring time Z = pmin(T1,T2,A) d1 = as.numeric(Z==T1) d2 = as.numeric(Z==T2) resData = data.frame("Z" = Z,"d1" = d1, "d2" = d2) theta = c(0.3,1,0.3,1) # Estimate cumulative hazard function cumFit_ind <- SolveLI(theta, resData,X) cumhaz = cumFit_ind$cumhaz time = cumFit_ind$times # plot hazard vs time plot(time, cumhaz, type = "l",xlab = "Time", ylab = "Estimated cumulative hazard function")
depCensoringFit
objectSummary of depCensoringFit
object
## S3 method for class 'depFit' summary(object, ...)
## S3 method for class 'depFit' summary(object, ...)
object |
Output of |
... |
Further arguments |
Summary of dependent censoring model fit in the form of table
indepCensoringFit
objectSummary of indepCensoringFit
object
## S3 method for class 'indepFit' summary(object, ...)
## S3 method for class 'indepFit' summary(object, ...)
object |
Output of |
... |
Further arguments |
Summary of independent censoring model fit in the form of table