Classification rates: non‐parametric verses parametric models using binary data
Estimations of the conditional mean and the marginal effects for particular small changes in the covariates have been of interest in financial, economics and even educational sectors. The standard approach has been to specify a parametric model such as probit or logit and then estimating the coefficients by maximum likelihood method. This is only applicable when the distribution form from which the data has been drawn is known. Non parametric methods have been proposed when the functional form assumptions cannot be ascertained. This research sought to establish if non parametric modeling achieves a higher correct classification ratio than a parametric model. The local likelihood technique was used to model fit the data sets. The same sets of data were modeled using parametric logit and the abilities of the two models to correctly predict the binary outcome compared. The results obtained showed that non‐parametric estimation gives a better prediction rate (classification ratio) for a binary data than parametric estimation. This was achieved both empirically and through simulation. For empirical results two different data sets were used. The first set consisted of loan applications of customers and the second set consisted of approved loans. In both data sets the classification ratio for non‐parametric method was found to be 1 while that for parametric was found to be 0.87 (only 87 out of the 100 observations were correctly classified) and 0.83 respectively. Simulation was done based on sample sizes of 25, 50, 75, 100,150,200,250,300 and 500. The simulated results further showed that the accuracy of both models decrease as sample size increases.
Key words: Parametric, non‐parametric, local likelihood, logit, confusion matrix and classification ratio