Prediction of 305-day milk yield in Brown Swiss cattle using artificial neural networks

________________________________________________________________________________ Abstract Artificial neural networks (ANNs) have been shown to be a powerful tool for system modelling in a wide range of applications. In this paper, we focus on the capability of ANNs to predict 305-d milk yield in early lactation of Brown Swiss cattle, based on a few test-day records, and some environmental factors such as age, number of lactation and season of calving. The ANNs that were developed were compared with multiple linear regressions (MLR). The various ANNs were modelled and the best performing number of hidden layers, neurons and training algorithms retained. The best ANN model had input, hidden and output layers of tansig transfer function. The layers had 4, 8, and 1 neurons, respectively. It was determined that the mean predicted values calculated by the ANNs were closer to the real mean values without showing any statistical difference. On the other hand, the predicted mean values calculated by MLR and the real mean values were significantly different from each other. The best prediction in ANN method was seen in 1st, 2nd, 3rd, and 4th test-day records when these were recorded to the system as X1‒X8 in the ANN system. In this study, the prediction of 305-d milk yield by ANN gave better results that those of MLR, suggesting that ANN can be used as an alternative prediction tool. ________________________________________________________________________________


Introduction
Dairy cattle breeding programmes are based primarily on milk yield and milk composition.Most cows are machine milked twice a day.The official milk records for the herd book are obtained by monthly testing during lactation.Farmers' incomes are derived from milk production and its composition.The accurate measurement or prediction of milk yield is essential to their economy (Fernandez et al., 2007).
Milk yield records that are shorter than the standard lactation should also be used to reduce the bias in estimating breeding values of sires owing to differences in the culling rates among the progeny groups.Early estimates of the sire's breeding value by extending lactations in progress can help to reduce the generation interval, as well as increase the intensity of selection, and thus create greater genetic progress.This early information can allow the farmer to decide whether cows should be kept for breeding.Furthermore, it helps in allocating resources such as feed, both for an individual cow and for a herd (Khan et al., 2005).Lacroix et al. (1995) reported that artificial neural networks (ANNs) allowed for an earlier and a more accurate prediction of milk production in cows.Such improvement is particularly important early in lactation, when a 305-d milk yield can be difficult to predict, and where such a prediction can have serious implications for the choice of future bull-dams.Early detection of low-producing animals is also important for timely culling decisions, with their associated economic benefits (Kominakis et al., 2002).In dairy production, prediction of milk yield is important, in that much of the selection of genetically superior bulls is based on their ability to produce high-yielding daughters.Therefore, the sooner these bulls can be identified, the sooner the collection of semen can commence and insemination of cows can proceed (Sharma et al., 2007).
Artificial neural networks are based on the neural structure of the human brain, which processes information by means of interaction among many neurons.In the past few years there has been a constant increase in interest in neural network modelling in various fields of material science (Taskin et al., 2008).These networks consist of many simple units working in parallel with no central control, and learning takes place by modifying the weights between connections.The basic components of an ANN are neurons, weights and learning rules (Stich et al., 2000).Neurons are organized in layers that process the input information and pass it to the following layer.The processing ability of the network is stored in the inter unit connection strengths (or weights) that are obtained through a process of adaptation to a set of training patterns (Fernandez et al., 2007).Methods based on ANNs seem particularly appropriate in a number of applications, owing to their ability to predict results by learning from the historical data sets of the problem without knowing the interactions among parameters, even if these are highly nonlinear.This ability of ANNs to predict relationships between input variables and their corresponding outputs in a complex biological system has resulted in some inspiring successes (Sharma et al., 2007).
With ANNs, there is no need to begin with an a priori model; nor is there a need to identify all the required variables beforehand.Artificial neural networks have also facilitated the combination of input types (e.g.binary and continuous) and they are potentially advantageous in modelling biological processes that are often characterized as highly non-linear (Lacroix et al., 1995).
In practice, ANNs have been successfully applied in many disciplines, such as engineering, and economic predictions, and in medical diagnoses.There has been relatively little research into the application of ANNs in the field of animal breeding.This is quite paradoxical, as data analyses are usually carried out in this field, and ANNs have shown to be more powerful than classical statistical methods to carry out these kinds of analyses (Fernandez et al., 2006).The reported research has focused on disease detection and dairy cattle breeding, which is concerned with predicting individual milk, fat and protein production.Yang et al. (1999;2000) applied ANNs to analyses related to predicting clinical mastitis in cattle and found that the technology was able to determine major factors related to the presence or absence of mastitis and to detect influential variables in predicting the incidence of clinical mastitis in dairy cows.Lacroix et al. (1995) and Salehi et al. (1998b) used the networks in milk yield predictions, and demonstrated that adequate preprocessing, a well-designed network model, and a proper set of variables may considerably influence the accuracy of milk production predictions.Salehi et al. (1998a;b) found a neural network model based on back-propagation learning useful in predicting 305-d milk yield, fat and protein.Milk production estimates were successfully obtained in a study by using feed forward ANNs by Sanzogni & Kerr (2001).Artificial neural networks have been applied to predict milk yield in dairy sheep (Salehi et al., 1988).Kominakis et al. (2002) tested the usefulness of ANNs in predicting lactation, as well as daily test milk yield(s) in Chios dairy sheep based on a few (2 -4) test-day records in the beginning of a lactation period.Grzesiak et al. (2003) compared the neural network and multiple regression predictions for 305-d lactation yield using partial lactation records.Sharma et al. (2007) used an ANN model to predict the first lactation 305-day milk yield using partial lactation records pertaining to Karan Fries crossbred dairy cattle.Hosseinia et al. (2007) estimated second parity milk yield and fat percentage of dairy cows based on first parity information using the neural network system.Njubi et al. (2010) applied ANNs to predict first lactation 305-d milk yield using test-day records in Kenyan Holstein Friesian dairy cows.These studies have shown that total lactation yield and short-term milk yield are positively correlated (Rayalu et al., 1984;Shrivastava et al., 1988;Brutta et al., 1989;Jain et al., 1991;Jadhav et al., 1998).
The aim of this study was to test the usefulness of ANNs in predicting 305-d milk yield, as well as test-day milk yield(s) in Brown Swiss cattle, based on a few (2 -4) test-day records and some environmental factors (age, number of lactation and season of calving) at the beginning of a lactation period.

Material and Methods
This study was conducted using data on Brown Swiss cattle from Malya State Farm, Kırşehir, Turkey.Data on 2 640 Brown Swiss cattle were collected each month from July 2002 to January 2009.Usually milk was collected twice daily.The daily milk yield was estimated as the sum of two yields.Season of calving was defined as winter (January to March), spring (April to June), summer (July to September) and autumn (October to December).Statistical analyses were carried out using SPSS 15.0 package.The ANN was designed using MATLAB 7.0.
The design of ANN architecture and methods of training, testing, evaluating and implementing the network is very important.It consists of the choice of ANN algorithm, the structure (number of layers and number of neurons in the layers), the input and output functions, and the learning parameters.This research focuses on the back propagation algorithm learning method.A back propagation algorithm seeks to minimize the error term between the output of the neural net and the actual desired output value.The error term is calculated by comparing the net output with the desired output and is then fedback through the network, causing the synaptic weights to be changed in an effort to minimize error.The process is repeated until the error reaches a minimum value.Each cow was described with a group of eight variables (X): X 1 , average 305-d milk yield X 2 , age X 3 , number of lactation X 4 , season of calving X 5 -X 8 , average test-day milk yield in the 1st, 2nd, 3rd, and 4th month of sampling, respectively.
The data set was divided randomly into three subgroups: the training set (70%), the verification set (15%), and the test set (15%).The verification data set controls the flexibility of the model, preventing model overfitting.It provides a criterion to stop the learning before the model learns the training data, since an excessive adjustment of the model in the training data could lead to poor results when new data are presented to the model.This procedure for model selection is known in the neural network (NN) literature as the early stopping procedure (Fernandez et al., 2007).After fitting the model in the training data, the test data set was used to control whether the processed data set gave a true prediction.
In order to construct the network, the neural network newff function (Equation 1) was used.The constructed network was a back propagation of ANN with three layers of input, hidden and output.The layers had 4, 8 and 1 neurons, respectively.There were seven input (X 2 , X 3 , X 4 , X 5 , X 6 , X 7 , and X 8 ) variables and one output (X 1 ) variable.Figure 1 shows the architecture of the network for the prediction of 305-d milk yield.The tansig transfer function was applied for input, hidden and output layers.The net was trained in 1e+5 cycles of element processing which included epoch = 5000, and goal = 1e-10, where epoch means a single pass through the sequence of all input vectors; and goal means performance is minimized to the goal parameter.In order to study the goodness of the ANN models, some performance indexes took the root mean square error (RMSE) into account.The MATLAB command newff generates a multilayer layer perceptron (MLP) neural network, which is called net (Beale, 2004;Koivo, 2008).Pre-processing data (e.g.standardization and normalization) may lead to an improvement in the learning process of ANN, which helps neural networks to predict better (Hosseinia et al., 2007).We trained NN with normalized data.This normalization keeps the NN from giving primacy to some inputs for their range instead of their importance in solving the problem.Data were normalized at the range -1 to +1 using the premnmx function.After the prediction, in order to see the output, we converted it back to the original scale, using the postmnmx function.
The same data were used for the multiple linear regression (MLR) model.The stepwise regression method was used to obtain the best prediction of the regression model or equation.After the MLR procedure, the least R-squares were obtained.The highest R-squares value was obtained for Equation 2. In this equation, R-squares were calculated as 0.47.Partial regression coefficients were not significant (P >0.05).The MLR was as follows: X 1 = 5942.56+ 36.26X 2 -85.18X 3 + 6.94X 4 -4.22X 5 -1.53X 6 -21.34X 7 + 13.80X 8 (2) Criteria of goodness of prediction of the ANNs and MLR were Pearson correlation (r) between observed and predicted yields, coefficient of determination (R-squares), standard deviation ) (σ , the average difference ) (δ between observed yields (OY) and predicted yields (PY), (ρ between the standard deviation of differences between observed and predicted yields, and the observed mean value (equation 4) (Kominakis et al., 2002).Where, for the i th record, ŷ was the predicted value by ANNs or MLR, i y was the actual value, n was the total number of records (Salehi et al., 1998;Grzesiak et al., 2003).Ratio of mean (RoM) described by Friedrich et al. (2008) was calculated as: The difference between the predicted 305-d milk yield values and the real respective values was tested using ANOVA.The significant means were compared using Duncan multiple comparison tests.

Results and Discussion
The results of the statistics of the observed (OY) and predicted (PY) 305-d milk yield for the ANNs and MLR are presented in Table 1.Predicted 305-d mean milk production was very close to that observed for the ANN.However, 305-d milk yield prediction by MLR was lower than the observed 305-d milk yield by 629.4 kg (P <0.01).Grzesiak et al. (2003) used ANN and MLR in their study, showing there was no significant difference between observed values and predicted values (P >0.05) suggesting that ANN was appropriate for modelling 305-d milk yield.The average 305-d milk yield predicted by the ANN was lower than the average observed yield of the 49 reference cows by 13.12 kg (Table 1).The average ANN prognosis did not differ (P >0.85) from the actual average.For the MLR, the average difference was lower by -91.3 kg, but the average yield generated by MLR did not differ (P >0.24) from the actual average yield of the analysed cows (Sanzogni & Kerr, 2001).
Table 1 shows that the highest r value was 0.95 and R 2 was 0.90 for the first four test-days when these records were inserted into the net (X 1 -X 8 ) via ANN module.For 305-d milk yield, the minimum r value was 0.62 and R 2 , which was 0.38 for X 1 -X 5 ANN module, while the minimum r value was 0.69 and R 2 , which was 0.47 for X 1 -X 5 MLR module.Olori et al. (1999) pointed out that R 2 ≥0.70 implies a very good fit for a model, while R 2 <0.40 model should not be used for prediction.
With respect to RMSE values, the minimum value was observed with X 1 -X 8 ANN.The maximum RMSE value belonged to MLR.Njubi et al. (2010)   δ -the average difference between observed yields (OY) and predicted yields (PY); ρ -ratio; RMSE -root mean square error; RoM -ratio of mean.
They concluded that ANN and MLR can both be used for prediction.Sanzogni & Kerr (2001) compared qualitative properties of MLR and two models of ANN.The MLR model, depending on the region, was characterized by an R 2 coefficient ranging from 0.78 to 0.86.A classic ANN showed lower R 2 coefficients (0.74 -0.82).The RMS errors for ANN and MLR were similar.In modelling, a high R 2 cannot always be a good criterion for prediction.Scatter plots of residuals can be investigated (Alpar, 1997).Figure 2 shows the difference between the observed milk yields and predicted milk yields.According to this graph, the better predictions were obtained for (X 1 -X 7 ) and (X 1 -X 8 ) by ANN.
With respect to δ value, the most accurate prediction occurred for X 1 -X 7 ANN.X 1 -X 8 ANN prediction was closer to this prediction.The worst prediction, with regard to δ value, was for MLR.For the ρ value, the best prediction was for X 1 -X 8 ANN.When the ρ value was the highest for X 1 -X 5 ANN module, the MLR prediction on the ρ value was close to that of the ANN module.
With respect to RoM values, ANN modules were similar to each other.The best prediction using 1st, 2nd, 3rd and 4th test records occurred for X1-X8 ANN.This ANN was very close to the ANN used in the 1st, 2nd and 3rd test-days when these were inserted into the system as a variable in X 1 -X 7 ANN.The weakest prediction was seen in the first day of the record inputting X 1 -X 5 ANN.In general, the MLR predictions were weaker than the ANN predictions.Similarly, Sanzogni & Kerr (2001) reported that feed forward ANNs gave better estimates of total milk production than multiple linear regression models, especially when prediction estimates were considered on a regional basis in Australia.Generally, the results encourage further studies on ANNs as an alternative to other biometric methods, with possible use of ANNs for practical on-farm analyses (Grzesıak et al., 2003).Comparison of results from this study and those from similar studies is important (Schaeffer et al., 2000;Jensen, 2001;Ferreira et al., 2002;Mostert et al., 2006).The use of the test-day model appears to be a better alternative to the 305-d lactation model because early selection based on test-days could reduce generation intervals (Swalve, 1998;2000;Jensen, 2001) and therefore improve the accuracy of evaluation at farm level.Culling unproductive animals would improve overall farm profitability (Njuibi et al., 2010).

Conclusion
In this paper, (X 1 -X 8 ) ANN has been proposed to predict the 305-day milk yield in dairy cattle.The ANN module provided a better prediction for the 305-d milk yield especially when 1st, 2nd, 3rd and 4th testday records were included as variables into the network.This prediction method was based on the first 4-d milk yield and predicted the real 305-d milk yield, suggesting that this will give animal keepers economic improvements.

Figure 1
Figure 1 Architecture of artificial neural networks (ANN) in MATLAB.
found similar results to our studies.MLR by using the first four testing days.

Figure 2
Figure 2Scatter plot of differences between observed and predicted yields in four applications of artificial neural networks (ANNs) and multiple linear regressions (MLRs).

Table 1
Statistics of the observed (OY) and predicted (PY) 305-d milk yield in the application cases of the artificial neural networks (ANN) and multiple linear regressions (MLR)