ON THE PROBLEMS OF PPS SAMPLING IN MULTI-CHARACTER SURVEYS

This paper, which is on the problems of PPS sampling in multi-character surveys, compares the efficiency of some estimators used in PPSWR sampling for multiple characteristics. From a superpopulation model, we computed the expected variances of the different estimators for each of the first two finite populations considered, as well as the exact bias and variance of each of these estimators. The results obtained show that the estimators proposed by Rao (1966), Amahia et. al. (1989) and the alternative in Amahia et. al. (1989) are better than the conventional estimator. In population I, where the study variable and the ancillary variable are highly and positively correlated, results show that the estimator in Amahia et. al. (1989) fare better than the alternative estimator. On the other hand, the results obtained from our population II where the correlation between the study variable and the ancillary variable is poor, reveal that the alternative estimator in Amahia et. al. (1989) is more efficient. Several other finite populations whose ρ are neither too high as in population I nor too poor as in population II were considered and it was discovered that the competition for efficiency only rests with the two estimators suggested by Amahia et al (1989) and Rao (1966). These interesting comparative results are shown in Tables.


INTRODUCTION
In sampling, the sampling units, as usually defined, are similar in size and structure. However, with some types of population it is convenient or necessary to use sampling units that differ in size, thus the farm is often the sampling unit for collecting agricultural data, though farms in the same region may vary in land acreage from a few acres to over 1,000 acres. Similarly, when obtaining information about sales or prices, the sampling unit may be a dealer or store, these ranging from small to large concerns.
Again, the total sample size can be subject to unduly large variation if it is based on random selection of clusters that differ greatly in size. If we subsample the selected clusters at a fixed rate, the expected sizes of the subsamples are proportional to the unequal cluster sizes. The total sample size depends on which clusters happen to fall into the sample.
In such cases as mentioned above the question arises: should differences between the sizes of the sampling units be ignored or taken into consideration in selecting the sample and in making estimates from the results of the sample? The differences should not be ignored otherwise there would be uncontrolled random sampling.
To account for the differences between the sizes of the sample units we have sampling with varying (unequal) probabilities. The commonest of this type of sampling is sampling with probability proportional to 'size' (PPS), the size being the value of the ancillary variable, (Cochran, 1946). This procedure uses the values of the ancillary variable in such a way that unequal probabilities of selection are assigned to the population units. Hence, if the values of an ancillary variable related to the study variable were known for all the N units, the information could be used in selecting the samples so as to provide estimators with greater efficiency than those from simple random sampling. PPS sampling can be with replacement (PPSWR) where any unit drawn is replaced before the next draw is made. It can also be without replacement (PPSWOR) where there is no replacement of any unit drawn before drawing the next. Nevertheless, in large scale surveys it will be quite uneconomical to carry out such surveys for the main purpose of estimating one parameter when in actual fact many other parameters could be estimated with little or no additional cost. Therefore, it is usually of interest to estimate parameters relating to several characteristics in such cases. Hence, only a single measure of size can be used in selecting the sample of primary units with PPS. In such PPS sampling in multicharacter surveys, some characteristics may not be related to the size (the ancillary variable). This situation has led to the development of many alternative estimators which shall be extensively examined in this work.

32
A. C. AKPANTA Recognition of the value of sampling with probability proportional to size especially as prelude to subsampling and when stratification with respect to other characteristics is desired -is due to Hensen and Hurwitz (1943). These men introduced the use of primary unit with probability proportional to some measure of their size for sampling of one primary sampling unit per stratum. Lahiri, (1951) advanced a sampling scheme where the sample is selected with probability proportional to the total size of the ancillary variable. He also presented a method for actually drawing the sample, which avoids the need for listing all possible samples and finding their total size and their cumulative sizes. Grundy (1954) developed a practical method of drawing sampling units with probabilities exactly proportional to size, in which both preliminary calculations and the addition of a large number of sizes are avoided. This method is considered as an extension of Lahiri's method for samples of one.
Although since 1934 a phenomenal number of learned papers have been written extolling the virtues of various modes of sampling with unequal probabilities, not much has been done in the area of PPS sampling in multi-character surveys. Rao (1966) who first looked into this area suggested an alternative estimator of the population total for characteristics which are poorly correlated with the selection probabilities in probability proportional to size sampling schemes for multi characteristics. He further compared these alternative estimators with the conventional estimators under a superpopulation model. It is shown that the average variance of the alternative estimators is smaller than the average variance of the conventional estimators under their superpopulation model. For making efficiency comparison between the usual estimators and the alternative estimators he proposed, Rao regarded the finite population as being drawn from an infinite superpopulation in which the study variable, y, and the ancillary variable, x, are independent. The results obtained do not apply to any single finite population but to the average of all finite populations that can be drawn from the superpopulation. Bansal and Singh (1985) put forward another alternative estimator of the population total for characteristics that are poorly correlated with the selection probabilities. They suggested another alternative estimator of the population total for probability proportional to size with replacement sampling scheme which considers the rough value of the correlation coefficient between the study variable y, and the ancillary variable x.
Their action (suggesting another alternative estimator) is informed by the fact that the situation considered in the model in Rao (1966) is "not commonly encountered in practice, since hardly can the correlation in the population be exactly equal to zero". Though Bansal and Singh mention that the bias of their estimator is expected to be smaller than that of the corresponding estimator in Rao (1966), they did not derive any expressions for the bias and did not make the necessary comparison. However, the expressions and the condition needed to show that their proposed estimator is more efficient than Rao's estimator are, to put mildly, 'quite complicated and difficult' to handle algebraically.
So far, we see that irrespective of the several estimators proposed in this case of PPS sampling in multi-character surveys, none of these can be considered to be entirely satisfactory from the point of view of precision and also applicability in practice. No wonder Amahia et al. (1989) in their work They also studied the efficiency of their estimators compared with other related estimators. is an unbiased estimate of the population total Y with variance

RELEVANT THEOREMS AND
Proof: Let be the number of times that the unit appears in a specific sample of size n, where may have any of the values 0,1,2,…,n. Consider the joint frequency distribution of the for all N units in the population. The method of drawing the sample is equivalent to the standard probability problem in which n balls are thrown into N boxes, the probability that a ball goes into the box being at every throw. Consequently the joint distribution of the is the multinomial expression For the multinomial, the following properties of the distribution of are well known: Where the sum extends over all units in the population. In repeated sampling the t i are the random Variables, whereas the are a set of fixed numbers.
This completes the proof.
Proof: By the usual algebraic identity, Introducing the variable , we have But from (7), we have Hence, an unbiased sample estimate of ( ) c Y Vˆ is given by The above result will be smaller than the corresponding result for On the other hand, if are unrelatedY would have a smaller variance than . According to Rao (1966) if is unrelated to Observe that the estimator in (13) and its variance in (16) have the form as the estimator Y and its variance estimator in equal probability sampling.

R Yˆˆ
Since the correlation in the population is never exactly equal to zero ( a condition implied by Rao's estimator), Bansal and Singh (1985) developed a new estimator of the population total for characteristics that are poorly correlated with the selection probabilities as , thereby reducing the estimator in (21) to Rao's estimator in (14 The referee to Amahia et el. (1989) suggested an alternative estimator worth-mentioning, i e This estimator is easy to construct and also is motivated by the fact that (24) Recall (16):

METHODS OF PPSWR SAMPLING
Among the methods of PPSWR include: Cumulative method; Grundy's method (1954); method of selecting from a map; PPS systematic sampling and the Lahiri's method (1951); which we shall adopt in this paper.
In using this method we let α denote max x (obtained by inspection),then choose a random number r in the range α ≤ < r 0 , and a random integer in the range If S . 1 N to s x r ≤ ,accept unit as a member of the sample, otherwise, try another pair of random numbers. Continue in such a manner until the required number of sample units are obtained .Naturally, this method involves the fewest rejections when do not differ too much in size (Cochran 1977, pp.251 After simplifying some algebra Amahia et al (1989)

POPULATIONS I -V
Although 100 different populations were considered , Table 6 Table 7 for only the 5 populations. From Table 7, one can observe that Ŷ P beats all other estimators in populations I, III and V, since its variance in each case has a minimum value. However, Ŷ R fares better than all others in population IV whereas Ŷ P 1 fares better than the rest in population II where ρ =0.062. The Table   also depicts that Ŷ C , Ŷ P and Ŷ P 1 are better estimators than Ŷ R when the study variable has a high and positive correlation (ρ ≥ 0.5) with the ancillary variable. On the other hand, the table also shows that Ŷ R , Ŷ P and Ŷ P 1 are better than Ŷ C when ρ< 0.5

CONCLUSION
From the fore-going and within the limitation of the superpopulation model as assumed by Rao (1966), one can empirically conclude in multiple character survey, under PPS sampling with replacement, that if the correlation between each of the study variables and the ancillary variable is not in the extreme case of either 0 or 1 then : (1) Ŷ C' Ŷ P and Ŷ P , are better estimators than Ŷ R when the study variable has a high and positive correlation (ρ ≥0.5) with the ancillary variable; (2) Ŷ R, Ŷ P and Ŷ P 1 are better estimators than Ŷ C when the study variable has a poor and positive correlation (ρ < 0.5) with the ancillary variable; and (3) Ŷ R and Ŷ P are better estimators when the correlation coefficient is neither too high as in Population I nor too poor as in Population II.