MULTIVARIATE TIME SERIES MODELING OF SELECTED CHILDHOOD DISEASES IN AKWA IBOM STATE

This paper is focused on modeling the five most prevalent childhood diseases in Akwa Ibom State using a multivariate approach to time series. An aggregate of 78,839 reported cases of malaria, upper respiratory tract infection (URTI), Pneumonia, anaemia and tetanus were extracted from five randomly selected hospitals in the State from 1997 to 2011. The monthly Cumulative clinical cases of aforesaid childhood diseases constitute vector time series. Prewhitening approach was employed to determine whether the components of vector series are interrelated so that each series can be predicted on the bases of lagged values of itself and others. This process revealed that except tetanus; malaria, URTI, Pneumonia and anaemia series are interrelated. Hence, the four interrelated time series were considered in the multivariate analysis. Order selection criteria were employed to determine the order of the vector autoregressive (VAR) model to be fitted to these series. It was discovered that VAR(1) model fitted well. Diagnostic checks were applied to ascertain the adequacy of the model and VAR(1) model was found appropriate. Forecasts were generated. The model revealed that upper respiratory tract infection, pneumonia and anaemia are linked to or caused by malaria.


INTRODUCTION
Childhood diseases are diseases that affect children between the ages of zero to 14 years. Some common childhood diseases include chickenpox, influenza, measles, whooping cough, anaemia, asthma, malaria, tetanus, pneumonia, upper respiratory tract infection (URTI), polio, tuberculosis, fever, HIV/AIDS, etc. However, the most common ones in Nigeria are malaria, upper respiratory tract infection (URTI), Pneumonia, anaemia and tetanus.
Diseases can be devastating for anyone, but it seems particularly unfair when they attack children. Unfortunately, many diseases seem to take a special interest in infecting the children more frequently and vigorously than the adults. Children are more susceptible to diseases for a number of reasons. The major reason for children's increased susceptibility is that they are often exposed to diseases, yet they have not built the immunologic defenses required to fend off certain diseases (Perlin and Cohen, 2002). UNICEF (2009) disclosed that no fewer than one million Nigerian children, especially those below the age of five, are lost yearly to preventable childhood killer diseases. Nigeria is one of the least successful African countries in achieving improvement in child survival in the past four decades, in spite of Nigeria's wealth in human and natural resources. Infant and childhood mortality rates are exceedingly high, and Nigeria ranks 15th highest in the world among countries with high under-five mortality.
Malaria destroys the red blood cells thereby causing anaemia. Anaemia is a common manifestation of malaria caused by four species of plasmodia. Prolonged malaria reduces the immunity of the body which may give rise to URTI and pneumonia. Pneumonia results from a failure of a series of host defense mechanisms that keep the respiratory tract free of infection. Many patients with pneumonia will have had mild upper respiratory symptoms and malaise for several days before the onset of pneumonia. (Stein et al, 1994) Sometimes children have an illness that is not curable but persists into adulthood. Coping with childhood illness can be very difficult at first, not only for the child, but for the whole family. In addition to the child's physical health and medical needs, one needs to manage the feelings that come with all the changes and health issues (Sawyer et al, 2003). Purohit et al (1998) examined the effect of seasonality and other temporal patterns on the occurrence of rotavirus diarrhea among hospitalized cases at Pune, India using Box and Jenkins approach. Seasonal autoregressive integrated moving average (SARIMA) model was fitted to the data. The model suggested strong influence of climatic changes on the incidence of the disease. Vanbrackle and William (1999) examined the statistical properties in detecting unusual patterns of reported cases of diseases from the Centre for Disease Control and Prevention. ARIMA models fitted to the reported cases of different diseases were used to generate one-step ahead forecasts. Wangdi et al (2010) focused on modeling and forecasting of malaria incidence in endemic districts of Bhutan using time series and ARIMAX analysis. SARIMA (2,1,1)(0,1,1) was found appropriate to fit the overall endemic districts and was used to forecast the number of cases in these areas. Inconsistency was noticed in the forecast using ARIMAX model.
Gharbi et al (2011) studied the incidence of dengue in Guadeloupe, French West Indies using Box and Jenkins approach to fit seasonal autoregressive integrated moving average (SARIMA) model to incidence of dengue using clinical suspected cases. Tian et al (2012) examined the effects of ambient temperature on coronary heart disease (CHD) mortality in Beijing, China, using both time series and time -stratified case-cross over models. Time series models had a better fit than time-stratified case -cross over models. Abeku et al (2014) assessed the accuracy of different methods of forecasting malaria incidence from historical morbidity patterns in areas with unstable transmission. Simple seasonal adjustment methods outperformed a statistically more advanced ARIMA method.
This work seeks to build multivariate time series model to interrelated childhood diseases so that each series can be predicted on the bases of lagged values of itself and others.

METHODOLOGY
According to Box et al (2008), in the study of multivariate processes, a framework is needed for describing not only the properties of the individual series but also the possible cross relationships among the series. These relationships are often studied through consideration of the correlation structures among the component series. Pre-whitening is used in this work to determine the interrelationship among time series.

Prewhitening of vector Time Series
According to Brockwell and Davies (2002), vector time series are pre-whitened by transforming the series to white noise by application of suitable filter before computing the cross-correlations. For example, if {X } and {X } are invertible ARMA (p,q) process, this can be achieved by the transformations and ϕ ,  are the autoregressive and moving average polynomials of the ith series, i = 1,2.
It is convenient to replace the sequences {Z } by the residuals {w } after fitting a maximum likelihood ARMA models to each of the component series. If the ARMA models were infact the true models, the series {w } would be white noise sequences for i = 1,2.

Multivariate Time Series
Multivariate time series analysis is the study of statistical models and methods of analysis that describe the relationships among several time series (Box et al, 2008). Here we consider -related time series variables of interest in a dynamic system = (X , , X , , … , X , ) and wish to gain a deeper understanding of the dynamic relationships over time among the series and to improve accuracy of forecasts for individual series by utilizing the additional information available from the related series in the forecasts for each series.

Vector Autoregression
A vector autoregression is a system in which each variable is regressed on a constant and p of its own lags as well as on p lags of each of the other variables in the vector autoregressive (VAR) model (Hamilton, 1994). Let = (X , , X , , … , X , ) ′ denote (m × 1) vector of time series variables. The pth-order vector autoregressive model denoted VAR(p) has the form Here C denotes an ( × 1) vector of constants and Ф an ( × ) matrix of coefficients for = 1,2, … , . The ( × 1) vector is a vector generalization of white noise.
where an ( × ) symmetric positive definite matrix. Using lag operator notation, (1) can be written in the form Here Ф(L) indicates an ( × ) matrix polynomial in the lag operator L. Vector autoregression is covariance stationary if all values z satisfying | − Ф z − Ф z … − Ф z | = 0 lie outside the unit circle.

Covariance and Correlation Matrices of VAR(1) Process
The relationship among components of the vector series are often studied through consideration of the correlation structures among the component series (Box et al, 2008). In particular, let us suppose that is a stationary VAR(1) model where is × 1 vector white noise with mean zero and covariance matrix,   . Alternatively, the process may be written in mean adjusted form as
If Ф and   are given, (0) can be determined using (4) and (5) For VAR(p) model, we have, Given the matrices (0), (1), ⋯, (k), (7) can be used to determine the coefficient matrices Ф , ⋯ , Ф . The correlation matrix function for the vector process is defined by where D is the diagonal matrix in which the ith diagonal element is the variance of the ith process. That is,

Criteria for VAR Order Selection
The selection criteria for VAR(p) models is given by where Σ(p) = T ∑ ε  ′ is the maximum likelihood estimate of Σ ε obtained by fitting a VAR(p) model, CT is a nondecreasing sequence of real numbers that depend on the sample size T, K is the dimension of the time series and(K, p) = pK is the number of VAR parameters in a model of order p, (Lutkepohl, 2005).
The most commonly information criteria for selecting lag orders are the Akaike (AIC), Schwarzt-Bayesian (BIC) and Hannan-Quinn (HPC).
where K is the number of freely estimated parameters. The estimate(AIC) for p is chosen so as to minimize the value of the criterion. Also, The order estimate(BIC) is chosen so as to minimize the value of the criterion. The third criterion, The estimate(HQC) is the order that minimizes HQC(p) for = 0,1, ⋯ , . The AIC criterion asymptotically over estimates the order with positive probability, whereas the BIC and HQC criteria estimate the order consistently under fairly general conditions if the order of p is less than or equal to P .

Diagnostic Checking of VAR Models
Here, Portmanteau test is usually employed to test for the overall significance of the residual autocorrelations of a VAR(P) model up to lag h (Lutkepohl, 2005). The hypothesis is given as, H : = 0 H : ≠ 0 where ( = 1,2, ⋯ , ℎ) are the autocorrelation matrices. The test statistic for large T and h is For T  , T T 2 (T i) 1 1 and thus Ǫ h has the same asymptotic distribution as Ǫ h . That is, where is the dimension of the time series and are the estimated autocovariance matrices of the residuals and the estimated variance for the residuals.

Modeling as Multivariate Time Series
Vector autoregressive (VAR) model was fitted to (malaria, URTI, pneumonia and anaemia). The VAR lag selection system with maximum lag order of 15 is tabulated in Table 2.0. AIC, BIC and HPC were employed to estimate the order, p, of the VAR model. The order with the minimum values of AIC, BIC and HQC is 1. Hence, VAR(1) model was tentatively identified. The estimates of the parameters of VAR(1) model were obtained with the aid of Gretl software. The parameter estimates, standard error and t-ratio for the VAR(1) model are presented in Table  3.0. The estimated VAR(1) model is

Diagnostic Checks
To ensure adequacy of the model and guard against model misspecification, diagnostic analysis of the residual series ( ) was carried out. The autocorrelation matrices of the residuals from VAR(1) model fitted to the interrelated time series appear to be multivariate white noise process. To confirm the overall significance of the residual autocorrelation matrices, Portmanteau test was carried out and. Q-statistic = Ǫ 44 = 710.333 was computed using (14) from residual autocorrelation matrices and the critical value was computed using (15) as follows: Since = 710.333 <  . (688) = 750. 131. We conclude that the fitted VAR(1) model is adequate. The parameters of VAR(1) model (Table 3.0) that are less than 2 times their estimated standard errors were regarded as insignificant and were set to zero. Thus, the resulting VAR(1) model in matrix form is The above model (17) is the final proposed model for the vector series on which the forecasts (Table 4.0) are based. The model revealed that upper respiratory tract infection, pneumonia and anaemia are linked to malaria.

RESULTS AND DISCUSSION
First and foremost, we considered the correlation structures among the component series. Pre-whitening of the various series was done to determine the interrelationship among them. Here, various univariate models [ARIMA (1,1,1), IMA (1,2), ARIMA (1,0,0), ARIMA (2,0,0) and ARIMA (1,0,0)] fitted well to individual series [malaria , ), URTI ( , ), pneumonia ( , , anaemia ( , ) and tetanus ( , ) respectively] and their residuals were obtained. The correlation matrices at few lags (( ), = 0,1,2,3) were computed from these residuals and the residual autocorrelation matrices, ( ), = 1,2,3 appear to be multivariate white noise process. The correlation matrix, (0), of the residuals shown in table1.0 reveals that while tetanus series is uncorrelated with the other series at 5% level of significance; malaria, URTI, pneumonia and anaemia are interrelated. Thus, the four interrelated series were considered for the multivariate analysis. Vector autoregressive (VAR) process was fitted to the interrelated series. Order selection criteria were employed to determine the lag length of the vector autoregressive (VAR) process. It was discovered that VAR(1) model fitted well. Diagnostic checks were applied to ascertain the adequacy of the model and VAR(1) model was found appropriate. Hence, forecasts were generated for interrelated series using VAR(1) model.

CONCLUSION
Malaria, URTI, pneumonia and anaemia are interrelated, but unrelated to tetanus series. Hence, Malaria, URTI, pneumonia and anaemia are components of multivariate time series. VAR(1) model provides adequate representation of interrelated time series. Forecasts generated from VAR(1) model indicate gradual decrease in the occurrence of malaria, URTI, pneumonia and anaemia. Also, the model revealed that upper respiratory tract infection, pneumonia and anaemia are linked to or caused by malaria

RECOMMENDATION
Patients diagnosed with any of these three diseases (URTI, pneumonia and anaemia), treatment should be administered simultaneously with the treatment of malaria.
Scientist involved in modeling stochastic process should not neglect the inclusion of other lagged variables that may likely affect the variable under consideration. From this work, it is apparent from the correlation matrix that different diseases are interrelated as supported by the medical scientists (Stein et al, 1994). Thus, analysis of this type can unfold some hidden relationship among diseases. We therefore recommend that specialist in this field of study should be given a chance to contribute in the health sector.