Comparison of survival models and assessment of risk factors for survival of cardiovascular patients at Addis Ababa Cardiac Center, Ethiopia: a retrospective study

Background Cardiovascular diseases (CVDs) is disorders of heart and blood vessels. It is a major health problem across the world, and 82% of CVD deaths is contributed by countries with low and middle income. The aim of this study was to choose appropriate model for the survival of cardiovascular patients data and identify the factors that affect the survival of cardiovascular patients at Addis Ababa Cardiac Center. Method A Retrospective study was conducted on patients under follow-up at Addis Ababa Cardiac Center between September 2010 to December 2018. The patients included have made either post operation or pre-operation. Out of 1042 cardiac patients, a sample of 332 were selected for the current study using simple random sampling technique. Non-parametric, semi-parametric and parametric survival models were used and comparisons were made to select the appropriate predicting model. Results Among the sample of 332 cardiac patients, only 67(20.2%) experienced CVD and the remaining 265(79.8%) were censored. The median and the maximum survival time of cardiac patients was 1925 and 1403 days respectively. The estimated hazard ratio of male patients to female patients is 1.926214 (95%CI: 1.111917–3.336847; p = 0.019) implying that the risk of death of male patients is 1.926214 times higher than female cardiac patients keeping the other covariates constant in the model. Even if, all semi parametric and parametric survival models fitted to the current data well, various model comparison criteria showed that parametric/weibull AFT survival model is better than the other. Conclusions The governmental and non-governmental stakeholders should pay attention to give training on the risk factors identified on the current study to optimize individual's knowledge and awareness so that death due to CVDs can be minimized.


Background
Cardiovascular diseases (CVD) is an aggregation of disorders of the heart and blood vessels. Coronary heart disease, cerebrovascular disease, peripheral arterial disease, rheumatic heart disease, congenital heart disease, deep vein thrombosis and pul-monary embolism are collectively named as cardiovascular diseases (CVDs) 1,2,3 . It is the leading cause of mortality globally and more people died annually from CVDs than from any other cause 1,5,6,7,8 . The burden of CVD is not evenly distributed, it varies throughout the world in type and distributions especially between developed and developing nations. An estimated 17.9 million people with CVD have died in 2016, which was 31% of all global deaths. Among the total deaths due to CVDs, over three quarters of CVD deaths were in low-and middle -income countries 9 . Due to globalization, aging and accelerated urbanization, CVD is the leading cause of death in Ethiopia 10 . In order to reduce death of CVDs patients, adequate information on the distribution of risk factors in different geographic and socioeconomic groups of the population should be made. This is the ultimate goal and sole contribution of the current study.
In fact, prevention of CVDs is always the most prioritized issues but equivalently, scholars should pay attention on the way to prolong the life of CVD affected pa-tients. Appropriate intervention should be made so as to reduce mortality and morbidity due to CVD. All potential stakeholders are spending considerable time to identify the most important risk factors that could be a cause for death of car-diovascular patients. Authors of the current study reviewed the most common risk factors from literature 11,12,13,14 . However, heterogeneity between patients is usually expected due to biological, environment, health facilities, physician experi-ence and commitment difference. Thus, investigating new risk factors for the same disease in different geographic areas, individuals and time is necessary and it is the main concern of the current study. Besides to this, Addis Ababa cardiac center nationally it is the first in kind which were established by February 2009. But, its treatment effciency for prolonging the survival of patients not yet well investigated.
The survival of patient data usually make analysis through survival models. Survival analysis involve the modeling of time to event data and in the current study, death or failure is considered as an "event" 15 . In survival analysis literature tradition-ally only a single event occurs, after which the organism or mechanism is dead or broken. Several methods have been developed for the analysis of survival data such as Kaplan-Meier, Logrank test, Cox regression, Accelerated Failure Time (AFT), but due to complexity of data one may be popular than the others for predicting events 16,17,18,19 . Thus, to make realistic analysis, there is a need to find-out the most appropriate statistical model and thus, model comparison also made.
Moreover, since considering the entire data set is challenging in terms of time, human resource and finance; the researchers are forced to consider samples to make inference about the population. The diffcult task here is obtaining the representa-tive and optimal sample where non-statistician usually challenged to consider the appropriate sample size determination formula for survival data analysis. Therefore, in the current study the researchers introduce appropriate sample size determina-tion procedures for survival data, This paper is organized as follows. Section 2 describes the materials and methods. The basic findings of the study are presented and discussed in Section 3. Finally, concluding remarks are provided in Section 4.

Materials and Methods
The current study has considered secondary data from the cardiovascular patient's card and information sheet at Addis Ababa Cardiac Center.

Study Population
The target population for the current study was cardiac patients who have taken either pre or post operations and who were under follow-up at Addis Ababa Cardiac Center from September 2010 − December 2018.

Variables of the study
Several variables that supposed to associate with death of cardiovascular patients were considered for the current study.

Dependent variable
The dependent variable of the current study is the survival time of cardiovascular patients. It is the time duration from the date of admission for treatment until date of death or censor. Cardiovascular patients who were alive during the study time or dropped before death were considered as censored. Right censoring was realized in the current data-set. Among the sample of 332 cardiac patients, only 67(20.2%) were observed and the remaining 265(79.8%) were censored. Usually, the survival models require the censoring percentage not exceeding to 50 percent 20 . But in the current study the percentage of censoring was 79.8% which was higher than 50% because of the less observed data during the eight years follow-up. Such situation is frequently happening and several literature was used survival models while censored individual less than the expected one 21 .

Independent variables (covariates)
Based on literature reviews, researchers experience and expertise suggestions, au-thors have considered the following explanatory variables: Age, Sex, Hypertension/Blood pressure, Dyslipidemia, Body mass index, Smoking, Alcohol use, Di-abetic Milletus, Chest pain, Pulse rate, Educational status, Region, Income level, Leg swelling, Types of CVDs, Family history, Pericardium, Orthopnea and Patient status.

Inclusion and Exclusion Criteria
The study has considered cardiovascular patients whose age was greater than or equals to 10 years and who have taken both pre and post operations, and who were under follow-up during the study period. However, the study excluded those patients whose age were less than 10 years and who have not taken either pre or post operations.

Sampling Techniques and Sample Size Determination
In the current study, simple random sampling technique was used to select a rep-resentative sample from large number of cardiac patients. Unlike others statistical methods, in survival analysis the sample size determination procedures should con-siders the following facts: the null hypothesis to be tested, test statistic, assumed effect size, size of the test (significance level, α), desired power, sample size (in terms of number of events), probability of an event during study, expected rate of loss, Sample size (in terms of number of cardiac patients) and adjustments for interim analysis. The present study considered hazard ratio from previous study on cardiovascular patients in Ethiopia. For this study, we have used (power = 1 − β), level of significance -α, type II error (β) and Equal allocation( π 1 = π 2 = , events = (zα 2 � +z β ) 2 π 1 π 2 (logHR ) 2 Where, zα/2 and zβ are standard normal percentiles 22 and the values are 1.96 at α = 0.05 level of significant and 0.842 at β =0.2 with 0.8 desire power respectively, HR= 0.32 (the hazard ratio of male patients to female cardiac patients) and pr(events) = 1 − π1S1(T) − π2S2(T). events= (1.96+0.842)2 0.5 *0.5(log(0.32))2 = 128. 688525 We have values for S1 (T) and S2 (T) by assuming Exponential Survival Times. For exponential failure times, IR= λ and S(t) = exp(−λt). The researcher uses an assumed IR to calculate failure probability for Sample Size calculations. Eight years (T= 8), equal allocation estimate of IR for one group is 0.8 events / p-eight year (or 0.1 events / person-one-year). For power calculations: Hence, after we got the optimal sample size, we have used simple random sampling technique to select the desired sample from a total of 1042 cardiac patients.

Survival Models
Survival data or time to event data measure the time elapsed from a given origin to the occurrence of an event of interest. In survival analysis, the researcher usually refers the time variable as survival time because it gives the time that an individual has 'survived' over some follow-up period 23 . There are three primary approaches to model survival processes: Non-parametric, Semi-parametric and Parametric sur-vival models.

Non-parametric Survival Models
Non-parametric analysis are widely used in situations where there is doubt about the exact form of distribution. Survival data are conveniently summarized through estimates of the survival function and hazard function. However, the distribution might not be perfectly pinned down mathematically. The estimation of the survival distribution provides estimates of descriptive statistics such as the median survival time. Median survival time is better than mean survival due to the fact that, it is not dependent on all the times to event being known. On the other hand, the mean time to event requires that all times to events are known but this is not the case all the time due to censoring problem. Moreover, the distribution of survival time is skewed and thus, it is described usually using median. In the current study, Kaplan-Meier estimator was used to estimate the survival probability of cardiovascular patients and log rank test was used for comparison of survival of patients in different categories 24 . If the Kaplan-Meier estimator for the whole observations period is more than 50%, the median survival time cannot be determined.

Semi-parametric Survival Models
The Cox regression model 25 is semi parametric survival model where the baseline hazard takes no particular distribution. It is still the more preferable than paramet-ric survival models because it has broad versatility and it contains both parametric and non-parametric parts simultaneously. h( ) = λ 0 ( ) ( 0 X), where, λ 0 ( ) is baseline hazard (the hazard value when the value of all covariates is zero).

Parametric Survival Models
Parametric survival models usually assume some shape for the hazard rate (i.e. flat, monotonic, etc). Usually hazard rate are expressed as a function of covariates ℎ ( ) = ( ) , and interpreted as the change in X. When all the covariates equals to zero, ℎ ( ) = ( 0 ), the base line hazard. Among the popular parametric survival regression models, authors have considered Weibull, Exponential, Log-normal and Log-logistic. An advantage of using a parametric distribution, it is possible to pre-dict time to event well after the period during which events have occurred for the observed data.

Weibull and Exponential models
The Weibull and exponential models are parametrized as both Proportional haz-ard (PH) and Accelerated Failute Time(AFT) models. The Weibull distribution is suitable for modeling data with monotone hazard rates that either increase or decrease exponentially with time, whereas the exponential distribution is suitable for modeling data with constant hazard. For the PH model, ℎ( ) = , for exponential regression, and ℎ( ) = −1 for Weibull regression, where λ is the shape parameter to be estimated from the data. Some authors refer not to λ but to = 1 26 .

Log-normal and Log-logistic models
The log-normal and log-logistic models are implemented only in the AFT form. These two distributions are similar and tend to produce comparable results. For the log-normal distribution, the natural logarithm of time follows a normal dis-tribution; for the log logistic distribution, the natural logarithm of time follows a logistic distribution. The log-normal survivor function is given by: is the standard normal cumulative distribution function. The log-normal regression is implemented by setting = and treating the standard deviation, σ, as an ancillary parameter to be estimated from the data. The log-logistic re-gression is obtained if zj has a logistic density. The log-logistic survivor function is given by: This model is implemented by parameterizing = (− ) and treating the scale parameter γ as an ancillary parameter which is estimated from the data 22 .

Results
Summary of Demographic, Socio-economic and Environmental covariates of 332 CVD patients who were under follow-up in Addis Ababa Cardiac Center are pre-sented in Table 1. In this regard, the Kaplan-Meier survival curve revealed that patients who drink alcohol and smoking cigarettes had less survival time than those patients who do not drink alcohol and smoke cigarettes respectively. Patients who were affected by CRHD had less survival time as compared to those who were affected by CHD, CAD, PDA, ASD and Others types of CVDs. Similarly, patients who were affected by CHD had less survival time as compared to patients who were affected by CAD, PDA, ASD and Others types of CVDs, and patients who were affected by CDA had less survival time as compared to patients that were affected by PDA, ASD and other types CVDs. In the same manner, patients who were affected by PDA had less survival time as compared to patients that were affected by ASD and Others types CVDs, and patients who were affected by ASD had less survival time as compared to patients that were affected by Others types CVDs. In the same way, survival of cardiac patients who were overweight had less survival time than cardiac patients who were underweight and normal, and cardiac patients who were underweight had less survival time compared to patients who were normal. Likewise, the survival time of cardiac patients without orthopnea was less than those cardiac patients with orthopnea. Comparison of Survival Estimates among catagories in the catagorical covariates Using Log-rank test The difference in survival time of categorical covariates is also supported by the log-rank test. As it is indicated in Table 2, sex, educational status, smoking, alcohol use, types of CVDs, blood pressure, pulse rate, body mass index, dyslipidemia, orthopnea, diabetes mellitus, chest pain and pericardium were significant covariate, whereas age, region, economic level, leg swelling and family history of patients were not significant at 5% level of significant.
Authors of the current study used Multivariable survival analysis instead of uni-variate analysis to consider the possibility that a weakly associated variables could become an important predictor of the outcome when taken together 27 . Thus, model comparisons have made and significant risk factors are selected based on the most preferred model.    The results of the multivariable cox proportional hazard model in Table 3 showed that sex, educational status, types of CVDs, blood pressure, pericardium, alcohol use, pulse rate, chest pain and family history of cardiac disease were significant covariates at 0.05 level of significance. Hence, these covariates had significant effect on the survival of cardiovascular patients as it was also shown in the Log-rank test. The the researchers in-clude only the main effects in multivariable model none of the interactions between covariates were significant. Parametric Survival Models QQ plot, AIC and Log-likelihood were used to identify the appropriate parametric survival models among the four widely considered parametric survival models. Thus, the researcher used Weibul survival model to determine predictors of CVD patient since it has smaller AIC and Loglikelihood as shown in Table 4.       The researchers have also checked the goodness of fit of the weibull model based on the likelihood ratio test.

Quantile-Quantile Plot
As shown in Table 5, the full model is better than the null model at 0.05 level of significance.   The results of Weibull AFT model presented in Table  6 showed that explanatory variables, sex, family history, educational status, types of CVDs, BP, PR, alcohol use and pericardium have significant effect on survival of CVDs patients at 5% levels of significance similar to the cox model. Model Comparison and Selection One wants to select the better model among several choices based the performance of the models for the following reasons. First, people can understand simpler models with fewer predictors and less complicated structure. Second, one can certainly add more and more features into the model without screening and get better and better fit, till perfect fit is achieved, but the problem is over fitting. The interest of the authors is to find the best-predicting model not the best fitting model.   28 , who reported that the percentage of deaths is higher for women. But, it is consistent with the study conducted at Washington which stated that male patients had lower survival time (higher hazard rate) as compared to women patients 29,30 According to the present study, the risk of death of alcohol user cardiovascular patients is higher as compared to non-user cardiac patients. This study also revealed that, blood pressure had significant effect on the survival of cardiovascular patients. Patients with normal blood pressure had less risk of death (high survival) as compared to those cardiac patients having abnormal blood pressure (high or low blood pressure). A similar finding 29,32 , suggested that blood pressure is a well-known risk factor for cardiovascular diseases. And another similar study on cardiac patients at Tikur Anbesa Specialized Tertiary Referral Hospital using cox regression model identify blood pressure as one of the major significant factors that affect the survival of cardiac patients 31 .
In the current study, age of patients at start of treatment and body mass in-dex had no significant effect on the death of cardiovascular patients, but pulse rate and types of CVDs had significant effect on the survival of cardiovascular patients. This finding contradicted with the finding of earlier study 14 which suggested that age and body mass index had significant effect on the death of cardiovascular pa-tients, but pulse rate and types of CVDs have no significant effect on the survival of cardiovascular patients. This result also contradict with the previous results 33 , which stated that cardiovascular system is strongly affected by ageing. Based on the current study, the variable pericardium had a significant effect on the survival of cardiovascular patients, cardiac patients having active pericardium has more survival compared with cardiac patients having no active pericardium.
The present study stated that, family history of cardiac diseases had significant effect on the survival of CVDs patients. Cardiovascular patients who have family history experienced less survives time than those patients who have no any family history. This finding is consistent with the report of earlier study 34 which states that family history of cardiac diseases was the main cause of cardiovascular disease and cardiac patients inherit heart diseases with higher tendency.
Based on this finding, chest pain has a significant effect on the survival of car-diac patients. Cardiovascular patients with chest pain had less survival time(higher hazard) than those cardiac patients without chest pain. Moreover, this study re-vealed that cardiovascular diabetes mellitus had no statistically significant effect on the survival of cardiac patients. Contrary to the current study 35 states that Diabetes mellitus is an important chronic disease on CVD morbidity and mortality. In addition, 36,37 contradict to the current finding.

Conclusions
The median survival time of cardiac patients at Addis Ababa Cardiac Hospital is 50% and this means that it needs to be optimized to 80% so that majority of patients will survive longer. Survival of CVDs patients was determined by their sex, types of CVDs, pulse rate, blood pressure, chest pain, family history of car-diac disease, educational status, pericardium and alcohol use. Although both semi parametric and parametric survival model has given similar significant factors, the parametric model(weibull AFT model) predict well to the cardiac data set even from other parametric models. Health extension programs should be implemented on a nationwide basis in Ethiopia, in order to inform policy and develop strategies and control risk factors of survival of cardiovascular patients. Thus, governmental and non-governmental organization should pay attention to give training on the risk factors identified on the current paper so as to create awareness and to reduce death CVDs patients.

Abbreviations
AIC:Akaike Information Criteria CVD:Cardiovascular disease BMI:Body Mass Index HR:Hazard Ratio AFT:Accelerated frailer time PH:Proportional Hazard BP:Blood Pleasure CRHD:Chronic rheumatic heart disease CHD:Congenial heart disease

Competing interests
Authors declare that they have no competing interests.

Author's contributions
Both authors, ZG and BY have participated in the entire phase of the manuscript. They were conceived with the idea and involved in design of the study, statistical analysis and the write up of the manuscript.

Avaliability of data and materials
Raw data can be made avaliable on request to the corresponding author.

Funding
Financial support was obtained from Hawassa University, budgeted through College of Natural and Computational Sciences postgraduate program office for the sake of MSc thesis completion.

Limitations of the study
We were unable to include important socio-demographic and socioeconomic factors like, physical activity, diet types, cholesterol level, and place of residence, ethnicity and others that might have contribution on the survival times of CVD patients. Globally, these factors are the major causes of death that have increased substantially since 1990 in some large populations and thus, such variables never been missed in any death related studies.