TIME SERIES MODELING OF DAILY ABANDONED CALLS IN A CALL CENTRE

Models for evaluating and predicting the short periodic time series in daily abandoned calls in a call center are developed. Abandonment of calls due to impatient is an identified problem among most call centers. The two competing models were derived using Fourier series and the Box and Jenkins modeling approaches. The selected models were shown to be both parsimonious and adequate using the P-P plots, Q-Q plots and residual analysis. In validating and comparing the models some efficiency measures were used with a view to determining the model which best represent the population under investigation. Fourier series model was found to be more efficient than Box and Jenkins model. The data for application were got from a GSM telephone provider.


INTRODUCTION
Call centers provides tele-services to their customers with the aid of an agent. As they speak with customers over the phone, they interact with a computer terminal, inputting and retrieving information related to customers and their requests. Customers, who are virtually present, are either being served or are waiting in what we call a telequeue, a phantom queue which they share, invisible to each other and to the agents who serve them. Customers wait in this queue until one or two things happen: an agent is allocated to serve them (through supporting software), or they become impatient and abandon the tele-queue. In recent years, several documented studies of the incoming call arrival process have been conducted and tested, thanks to technology advances in the call center industry. Earlier studies focused on classical Box and Jenkins, Auto-Regressive-Moving-Average (ARMA) models such as the Fedex company study. A well-known study, also employing Auto-Regressive-Integrated-Moving-Average (ARIMA) model techniques, was carried out by Andrews and Cuningham (1995) to produce L.L Beans Call Center daily forecasts. The study focuses on modeling two different arrival queues each with its own characteristics. Their models incorporated exogenous variables along-side the moving average (MA) and Auto-regressive (AR) variables, using transfer functions to help predict outliers such as holidays and special sales promotion periods. A more recent study was carried out by Taylor (2006) in this study several different time series models were investigated on two different sources of data. Among these models were seasonal ARMA models, exponential smoothing for double seasonality methods, and dynamic harmonic regression. His result indicated that for short term forecasting horizons, the exponential smoothing for double seasonality method performs quite well but for a practical horizons (longer than one day) a very basic averaging model out-performs all of the suggested alternatives. More so, Aldor-Noiman (2006) in his research, count models which are based on a mixed Poisson process approach. The first model uses the normal-Poisson stabilization in order to employ linear mixed mode techniques and the second model employs the Bayesian approach, implementing Gibbs sampling techniques, and using "open bugs" software, to produce the predictive distributions for the future arrival counts. SenGupta & Ugwuowo (2006) proposed asymmetric angular-linear multivariate regression models, which were motivated by the need to predict some environmental characteristics based on some circular and linear predictors. In the work, they adopted a simple cosine function for the wind energy and solar energy data. A measure of fit was provided through the residual analysis. The purpose of this paper is to model the short periodic time series in daily abandoned calls in a call center. The Fourier series model and the Box and Jenkins models were developed with a view to determining the most efficient model. In section 2 we introduce the data and the background of this study. Section 3 discusses the two modeling approaches while the model fitting were given in section 4. A measure of the efficiency of the fitted model is given in section 5. Discussions on the overall fit of the model and other conclusive notes were shown in section 6.

2.
Data and background

Description of Call Center Operation
When a customer calls one of the several telephone numbers associated with the call center, except for rare busy signals, the customer is then connected to VRU and identifies herself. While using the VRU, the customer receives recorded information, general and customized. It is also possible for customers to perform some self service 57 transaction here, and whenever a customer indicate interest to speak to an agent, he/she dial up another number as recorded in the voice machine. If there is an agent free who is capable of performing the desired service, the customer and the agent are matched to start service immediately; otherwise the customer joins the tele-queue. Customers in the tele-queue are normally served on a first-come-first-served (FCFS) basis, and customers' placed in queue are distinguished by the time at which they arrive to the queue. While waiting, each customer periodically receives information on his/her progress in the queue. More specifically, he/she is told the amount of time that the first caller has been waiting, as well as her approximate location in the queue. The announcement is replayed every 60 seconds or so, with music, news, or commercial intertwined. The diagram below gives a schematic summary of the events history of calls through the system.

The Data Structure
The data archives all the calls handled by call centre of a GSM phone service provider, over a period of six months. Each file consists of records (lines), a record per phone call (between 20,000 to 30,000 calls per month), have 13 fields.
Each entering phone call is first routed through a voice response units (VRU) labeled AA01 to AAo6 (6 digits) and each VRU has several lines labeled 1-16. There are a total of 65 lines and each call is assigned a (VRU) number, a line number and a call ID (5 digit number). Although they are different, the ID's are not necessarily consecutive due to being assigned to different VRU. The date of the call is captured in the format as Day-month-year (6 digit number). The time a call enters the call centre is recorded in a 6 digit number. More specifically, each calling customer must first be identified, which is done by providing the VRU with the customer ID. Hence this is the time the call enters the VRU. The time the call exits from the VRU either to join the queue or directly to receive service or to leave the system (abandonment) is also recorded in a 6 digit number. The time (in seconds) spent in the VRU (calculated by exit timeentry time) is captured in a 3 digit number. The time of joining the queue (being put on "hold") is recorded as a 6 digit number; this entry is 00:00:00, for customers who have not reached the queue (abandoned from the VRU). Time (in seconds) of exiting the queue, either to receive service or due to abandonment and time spent in queue (calculated by q_ exit − q_ start) are also recorded. The possible outcomes for each phone call are: AGENT-service; HANG-hung up; PHANTOM -a virtual call to be ignored. Other information captured are, time of beginning of service by agent, time of end of service by agent and service duration in seconds (Calculated by ser _exit -ser _start).

Abandonment
The maximal time a customer is willing to wait on queue is his patience time, A, also known as time-toabandonment. The time he must wait before beginning service is his virtual queue time, V. The actual wait time is W = min (A, V), terminated by either abandonment (whenever V >W), or beginning of service (V = W). In heavy traffic, even a small fraction of calls that abandon the queue can have a dramatic effect on system Performance Gans et al. (2003). On the theoretical side, for a many-server queue with abandonment operating under heavy traffic conditions, fluid approximations in Whitt (2004b) show that steady-state performance depends strongly upon the distribution of A beyond its mean. This suggests modeling the daily abandonment, preferably the number of callers that abandon on daily basis.

End of service
Service Abandon End of Service VRU/IVR Queue

Incoming calls
The Models

Fourier Series Model
The practical value of this approach to time series analysis rests most obviously on the empirical observation that many time series data exhibit cyclic fluctuations in value, but at frequencies which are not always predictable before the data are examined. The periodogram is a summary description based on a representation of an observed time series as a super position of sinusoidal waves of various frequencies. To motivate the periodogram, we shall examine a very simple model for a time series (y t ) exhibiting cyclic fluctuations with a known period, p say. We assume that Y t = α Cos (wt) + β Sin (wt) + Z t t = 1,2, ,n (1) Where {Z t } is a white noise sequence, w = 2Πp, p = k/n is the known frequency of the cyclic fluctuations, and α, β are the parameters. Where 2Π = 6.28571429 in radius or 360 degrees. Now consider a slightly less simplistic model in which we contemplate not one, but several sinusoidal components. This gives a model of the form µ } is again a white noise sequence and each k w is one of the Fourier frequencies. The estimate of the parameters can be written explicitly as: Figure 1 shows the data under consideration which is made up of Sine and Cosine waves with different frequencies; Z t is the observed value for period t, 'n' is the data size. The periodogram was originally used to detect and estimate the amplitude of a Sine component, of known frequency, buried in noise. Since the set of data under consideration is odd, hence the periodogram then consist of the m= (N-1)/2 values i.e. k = 1, 2 m values.

2.2
Seasonal model We consider the family of seasonal models since the series shows a marked seasonal pattern (See Figure 1 below).The nature of abandoned calls has been found to exhibit periodic behavior with period week 1 = S . The general multiplicative model as given in Box et al (2008) is, Where for this particular model  30  29  28  27  26  25  24  23  22  21  20  19  18  17  16  15  14  13  12  11  10 9  8  7  6  5  4  3  2  1  31  30  29  28  27  26  25  24  23  22  21  20  19  18  17  16  15  14  13  12  11  10 9  8  7  6  5  4  3  2  1  30  29  28  27  26  25  24  23  22  21  20  19  18  17  16  15  14  13  12  11  10 9  8  7  6  5  4  3  2  1  31  30  29  28  27  26  25  24  23  22  21  20  19  18  17  16  15  14  13  12  11  10 9  8  7  6  5  4  3  2  1  28  27  26  25  24  23  22  21  20  19  18  17  16  15  14  13  12  11  10 9  8  7  6  5  4  3  2  1  31  30  29  28  27  26  25  24  23  22  21  20  19  18  17  16  15  14  13  12  11  The Fourier Model Detecting the best parsimonious model is an important step in the analysis of data sets. After estimating the parameters of the model, several diagnostic tools were used to assess the goodness-of-fit of the model. Going by some set of assumptions, the standardized residuals ought to resemble a sample from a standard normal distribution. Hence, in Figure 2, we consider the normal P-P plot and the Q-Q plot of the standardized residuals to assess the goodness-of-fit of the model. The plot shows an amazingly good imitation of a straight line suggesting that the fitted model is adequate. Summary of the analysis for the parameters of the model as obtained using SPSS is given in Tables 1, 2 and 3. Figure 3 shows the plot of the standardized residual with the fitted values. Once more we observe that there are no definite patterns that will cause us to doubt the fitted model. The plot of the histogram of the standardized residual can be seen in figure 4. The plot is somewhat symmetric and tails off at both the high and low ends as a normal distribution does. Identification of the model was made after comparing the strength of the relationship between several models and the dependent variable.     for l=1,2,3, ,n
After an appropriate time series model has been tentatively identified, the next thing is to estimate the parameters of the model. Table 4 gives the final estimates of the model while Figure 6 shows the graph of the fitted model.
The chi-square test for adequacy of the model gave the calculated value to be 27.30 as against the tabulated value which gave 55.6. Hence we can conclude that the fitted model is very adequate since the calculated value is less than the tabulated value. Further test of adequacy of the model was done by examining the residuals from the actual and fitted model and this was found to be a white noise process. Fig. 7 shows the plot of the residual versus the fitted values and we can observe that the points are clustered around zero which is an indication of a good fit. Fig. 8 also shows that the histogram of the residual is normally distributed as expected. Figures 9 & 10 shows the P-P plot and Q-Q plot respectively. The correlogram of the standardized residual is shown in figure 11. We observe that all values are within the horizontal dashed line which indicates that the stochastic component of the series is white noise. The model was also found to be both stationary and invertible.

4.
Efficiency of the fitted models The foregoing analysis of the two modeling approaches has shown that the fitted models are both adequate and parsimonious. In this circumstance, the identified models using the two approaches could be considered as competing models for a parsimonious representation of the short periodic time series. However, if the Fourier series model is an adequate model, how much do we lose if we use the less parsimonious SARIMA (p,d,q)(P,D,Q) 7 model? To approach this problem, we must first consider how to compare the two models. The parameters themselves are not directly comparable, but we can compare the estimates of the model at comparable time points. Consider the estimates of the two models at Thus in the Fourier series model, we estimate the no of abandoned calls on the first day with a standard deviation that is about 55% as large as it would be if we estimated with the SARIMA model. This however shows that the Fourier series model is more effective than the SARIMA model.

CONCLUSSION
In this paper we have presented two approaches to modeling a periodic time series. The main emphasis is on Fourier series modeling of daily abandoned calls in a GSM call center and its comparison with the Box and Jenkins model. The analysis involved development of several models using both approaches and selecting the best out of those models. Assessment of the accuracy of the estimates under the approaches was considered. The reliability and efficiency of the approaches were also investigated. The residual standard deviation gives the absolute measure of the goodness of fit of the estimated model and since the value is very low, it indicates a good fit. The t-value or t-ratio shows that in each case the null hypothesis will be rejected. A plot of the standardized residuals versus the corresponding fitted values shows that there is no definite pattern. The frequency histogram of the standardized residuals for the two models shows they are some what symmetric and has a tail off at both the high and low ends as a normal distribution does. Further check for normality can be seen from the plot of normal scores or quantile-quantile (QQ) plot. The plot shows the quantiles of the data versus the theoretical quantiles of a normal distribution. With normally distributed data, the QQ plot looks approximately like a straight line. The straight-line pattern of the QQ normal scores plot for the standardized residuals supports the assumption of a normally distributed stochastic component in this model. Finally, a test for possible dependence in the stochastic component was done by plotting the sample autocorrelation of the standardized residuals. It is observed that all values are within the horizontal dashed lines which are placed at zero plus and minus two approximate standard errors of the sample autocorrelation. This shows that it is reasonable to infer that the stochastic component of the series is white noise.