Short communication Validation of the 50k Illumina goat SNP chip in the South African Angora goat

Copyright resides with the authors in terms of the Creative Commons Attribution 2.5 South African Licence. See: http://creativecommons.org/licenses/by/2.5/za Condition of use: The user may copy, distribute, transmit and adapt the work, but must recognize the authors and the South African Journal of Animal Science. ______________________________________________________________________________________ Abstract Tools for the genomic evaluation of goats have generally lagged behind those for other species. However, the recent availability of the goat SNP50 consortium bead chip has marked a positive change for this small ruminant species. Polymorphic loci can differ greatly between breeds of the same species. Exclusion of fibre-producing breeds, such as the Angora goat, during the development of this genotyping array necessitates the validation of SNPs included on the chip to allow for genomic applications that would accelerate genetic progress in mohair yield and quality. Forty eight unrelated Angora goats, displaying phenotypic variation in two important price-determining traits, namely fibre diameter and fleece weight, were genotyped with the goat SNP50 consortium bead chip. Results revealed that 46 983 SNP (88.1%) of the 53 347 called SNPs were polymorphic (MAF>0.05). After quality control, 3 960 SNP were filtered from further analysis for violating Hardy-Weinberg and call-rate parameter thresholds, leaving 43 759 (82%) of the 53 347 SNPs to be validated for downstream analysis. Observed and expected heterozygosity values of 0.365 and 0.370, respectively, were obtained for polymorphic SNPs. A total of 30 357 SNPs in linkage disequilibrium (LD) were removed to obtain a set of independent markers, resulting in a final SNP density of 1 SNP/ ~226 kb. Results indicate that the goat SNP50 bead chip was informative in the Angora goats that were studied, and should be useful in examining the underlying genetic variation. ______________________________________________________________________________________

Genetic research on the South African Angora goat has been extensive, and has included quantitative studies (Snyman & Olivier, 1996;Visser & Van Marle-Köster, 2009) pertaining to the estimation of genetic parameters for traits of economic importance and molecular research using microsatellites (Visser & Van Marle-Köster, 2009;Visser et al., 2010;2011a;b;2013).Although microsatellite markers have greater polymorphism on a per-marker basis, they are not as abundant as Single Nucleotide Polymorphisms (SNPs), lack sufficient coverage across the genome, and have limited automation options (Vignal et al., 2002).The recent publication of a radiation hybrid map (Du et al., 2012), a reference genome sequence (Dong et al., 2013), and the consequent development of the 50K SNP chip (Illumina Inc., San Diego, Calif) (Tosser-Klopp et al., 2014) for the domestic goat marked a positive change for this species.
A key initiative in establishing the International Goat Genome Consortium (IGGC) was the development of a moderate-to high-density genotyping tool for the domestic goat (Tosser-Klopp et al., 2014).Due to the fact that three separate SNP discovery projects were underway before their consolidation within the IGGC, the detection of variants for the goat species was done through two pipelines -one for dairy and mixed breeds, and the other for meat breeds.Fibre-producing breeds were not included in any of these pipelines, and thus not in the development of this chip (Tosser-Klopp et al., 2014).Research has shown that polymorphic loci differ greatly between breeds of the same species (Garrick, 2011), and high levels of polymorphism have been observed in breeds that were not included in the development of the goat SNP chip (Kijas et al., 2013;Tosser-Klopp et al., 2014).This suggests that high levels could be observed across goat breeds that were not sampled for SNP discovery, and thus the developed chip could be useful for Angora goats.South Africa is a major producer of mohair and the use of genomic technology for genetic improvement needs due consideration (Visser & Van Marle-Köster, 2014).The aim of this study was to validate the use of the goat SNP50 bead chip in the South African Angora population.
Whole blood samples of 48 unrelated Angora goat kids were collected from the DNA biobank for small stock research and conservation (Grootfontein Agricultural Development Institute, National Department of Agriculture).Animals with varying performance in two important price-determining phenotypic traits, namely fibre diameter and fleece weight, were chosen to ensure a diverse group of animals, and thus encapsulate a broad spectrum of genetic variation.For fleece weight, phenotypic values ranged from 0.5 kg for inferior animals to 2.4 kg for superior animals, and for fibre diameter phenotypic values ranged from 35.7 micron for inferior animals to 19.5 micron for superior animals.Blood samples were transported on ice to the University of Pretoria (UP).DNA was extracted from the whole blood samples at the Animal Breeding and Genetics Laboratory (UP), using Qiagen DNeasy Blood and Tissue kit® (Whitehead Scientific (Pty) Ltd, Cape Town, South Africa) according to the manufacturer's protocol.Genomic DNA quantity and quality for all samples were estimated with the NanoDrop spectrophotometer (NanoDrop ND-1000) and plate reader (SpectraMax® Microplate Reader) at the Genetics Department of the University of Pretoria, and the Qubit® 2.0 fluorometer (Life Technologies (Pty) Ltd, Carlsbad, Calif, USA) at the ARC Biotechnology Platform.The average DNA concentration obtained over all the samples was 74 ng/µL with an average 260 : 280 ratio of 2.03.
Genotyping was conducted at the ARC Biotechnology Platform with the Illumina goat SNP50 bead chip, which features 53 347 SNP probes distributed across the whole goat genome, and provides an average inter-SNP spacing of ~40 kb (Tosser-Klopp et al., 2012).Genotyping was conducted over three days, and included overnight whole-genome amplification, followed by fragmentation, precipitation and resuspension of the samples in a hybridization buffer.Hybridization of the DNA to the bead chips occurred overnight for 20 hours in a hybridization oven at 48 ºC.After hybridization, the bead chips were washed, stained and dried.Processed bead chips were imaged with the Illumina iScan Reader, after which data were transferred to Illumina GenomeStudio 1.9.0 software for analysis.
GenomeStudio was utilized to generate PLINK (Purcell et al., 2007) input files, including the MAP (SNP panel) files and PED (genotypes per individual) files, which were used to perform sample-based and marker-based quality control measures to filter non-informative individuals and SNPs.Sample-based quality control was based on rates of missing genotypes, and SNP-based quality control was based on call rate, minor allele frequency and Hardy-Weinberg equilibrium (HWE).SNPs that had a call rate below 98%, MAF below 5% or violated HWE (P <0.001) were removed from further analysis.After quality control, PLINK was used to calculate observed and expected heterozygosity across all autosomes.The --indep-pairwise (50 5 0.2) option was then used to remove SNPs that were in linkage disequilibrium in order to attain an independent set of SNPs for downstream analysis.
Sample-based quality control was performed first.No individuals were removed for having a call rate below 98% with the average call rate across the 48 samples being 99.6%.A total of 53 347 SNPs were considered for analysis before marker-based quality control filtering, including 1417 unmapped SNPs and 1986 SNPs located on the X-chromosome.Results following marker-based quality control are indicated in Table 1.After the procedures for MAF filtering, it was found that the remaining 46 983 (88.1%)SNP were polymorphic for the sampled individuals.An average MAF of 0.25 was found across the 48 samples.After quality control, the observed and expected rates of heterozygosity of the polymorphic autosomal SNPs were 0.365 and 0.370, respectively.Only autosomal SNPs and SNPs with known genomic location, that is, mapped SNPs, were considered for linkage-based filtering, and therefore only 41 016 of the 43 759 SNPs were used.A total of 30 357 SNPs were removed after LD pruning, resulting in a set of 10 659 independent SNPs.SNP densities were estimated before and after LD pruning and are summarized in Table 2.
Quality control, including sample-based quality control and marker-based quality control, is an important step in utilizing SNP data, and is performed to remove potential biases introduced during study design, as well as errors from genotyping laboratory procedures and subsequent genotype calling (Anderson et al., 2010).Sample-based filtering relates to the sample call rate, and can be influenced by factors such as DNA quality and concentration, as well as genotype calling algorithms (Anderson et al., 2010).The average call rate of 99.6% obtained in this study is comparable with call rates of above 99.5% (Kijas et al., 2013) and 99.9% (Tosser-Klopp et al., 2014) reported in international studies in which numbers of Angora goats were genotyped using the 50K goat SNP chip.These results compare favourably with the average call rate of approximately 99.9% obtained across 10 goat breeds during the validation phase of the development of the goat SNP chip (Tosser-Klopp et al., 2014).In comparison with other validation studies performed for livestock in South Africa, the average call rate obtained can be compared with a range of 96.7% (Holstein) to 99.7% (Angus) obtained during an evaluation of the performance of the bovine SNP50 bead chip in South African cattle breeds (Qwabe et al., 2013).
The successful application of SNP arrays depends largely on the degree of polymorphism in the various breeds within each species (Fan et al., 2010).An average MAF of 0.25 obtained across all 53 347 loci compares favourably with average MAF ranges of 0.18 to 0.23 in horses using the equine SNP50 bead chip (McCue et al., 2013), and a value of 0.24 for African N'Dama and Sheko breeds using the bovine SNP50 bead chip (Matukumalli et al., 2009).During an evaluation of the bovine SNP50 bead chip in four South African cattle populations, Qwabe et al. (2013) reported MAF ranges between 0.17 for Nguni cattle and 0.22 for Holstein cattle.
After filtering out SNPs with low MAF, 46 983 SNPs (88.1%) proved to be polymorphic for the South African Angora goat, which is comparable with the number of polymorphic loci found in other goat breeds (Jinlan: 45 648, Skopelos: 50 908) that were not included in the SNP discovery process (Tosser-Klopp et al., 2014).However, differences in sample sizes among studies should be considered in comparing the results.The number of polymorphic loci compares favourably with values of 28 869 and 35 084 SNPs obtained for African N'Dama and Sheko breeds, respectively, during Illumina's bovine SNP50 content validation study (Matukumalli et al., 2009).The level of polymorphism compares favourably with the range of 35 843 for Nguni to 41 078 for Holstein when the bovine SNP50 chip was validated for cattle breeds in South Africa (Qwabe et al., 2013).All these comparisons were based on similar quality control parameters in previous studies and in the current study.
The average observed heterozygosity for the 48 animals was calculated as 0.365, which was similar to the expected value of 0.370.The observed heterozygosity was lower than the value of 0.442 obtained for Angora goats sampled by Kijas et al. (2013).
To perform certain population genetic analyses such as principle component analysis (PCA), it is ideal to include only independent SNPs (Anderson et al., 2010).Markers that are in LD were inactivated in order to remove bias caused by linkage and to obtain a set of SNPs with the maximum number of independent markers (Davis et al., 2011).Following LD pruning, there should be no SNP pair within a given window, usually 50 kb, that are correlated, taken as r 2 >0.2 (Anderson et al., 2010).The SNP density obtained before LD-based filtering (1 SNP/~59 kb) was lower than the densities of 1 SNP/~53 SNP obtained for Alpine and Saanen goats, but higher than the densities of 1 SNP/~67kb and 1 SNP/~71kb for SA Boer and Toggenburg goats, respectively (Brito et al., 2014).SNP densities obtained after LD-based filtering (1 SNP/~226kb), however, were considerably lower than the original average inter-SNP distance range of 30 -90 kb, as stated by Tosser-Klopp et al. (2014).
It can be concluded from the results that the breed-based application of the goat SNP50 bead chip for the South African Angora goat is possible.Even though no fibre-producing breed was included in developing this genotyping tool, the high level of polymorphism observed in this study suggests that the goat SNP50 bead chip will allow for applications such as genome-wide association studies, diversity studies, parentage verification, selection signatures and eventually genomic selection.

Table 1
Results following marker-based quality control

Table 2
SNP densities before and after linkage disequilibrium filtering