Next-generation sequencing for investigating the diversity of microorganisms and pathogenic bacteria in a water source

Purpose: To employ next-generation sequencing (NGS) to investigate the diversity of microorganisms and pathogenic bacteria from a water source in Tai Lake, China, in winter. Methods: Water samples from the same source were collected over a period of 3 months (December 2013 to February 2014), and their physicochemical characteristics were determined. The DNA of the samples were extracted and amplified by polymerase chain reaction (PCR). The PCR products were sequenced by Miseq PE300 pyro sequencing platform. The results for 16S rDNA were analysed using visualization software Gephi, and the 16S rDNA gene pool of known pathogenic bacteria was established. Results: A total of 144,292 16S rDNA gene sequences were obtained and ranked by RDP classifier. The average length of the sequences was 395.66 bp. They revealed 580 operational taxonomic units (OTUs) classified into 16 phyla. A full length of 16S rDNA gene database of common pathogenic bacteria was established. After blasting, 17 species of pathogenic bacteria were found. The most abundant potential human pathogenic bacteria were affiliated to B. tribocorum. Most environmental factors had significant impact on pathogenic bacteria. Conclusion: These results indicate that NGS can be used for the simultaneous detection of most recognized water-borne pathogenic bacteria. Variations in microorganisms in water source at different periods in winter can provide insight into the diversity of microorganisms in the water.


INTRODUCTION
Water-borne diseases are illnesses caused by intake of water that harbors pathogenic microorganisms [1].In developing countries, thousands of people die from water-borne diseases every year [2].Water-borne diseases are caused by water-borne pathogens, which are recognized as the microbial risk in drinking water [3].In addition to pathogens, drinking water may also harbor various animal and plant pathogenic microorganisms [4], which survive even after water processing [5].These microorganisms persist in drinking water, thereby posing danger to people's health.
Several advances have been made in analysis of drinking water in an effort to make it safe for drinking.For instance, indicator organisms such as total coliforms, faecal coliforms and Escherichia coli, have been used for nearly a century in monitoring the quality of drinking water [6].However, each indicator organism has its own limitations, and sometimes it may not be effectively detected in the water [7].Pure culture method has also been used to detect pathogenic microorganisms.However, a large number of microorganisms and pathogens cannot be detected by pure culture because although they are viable, they are non-culturable [8].In addition, water-borne microorganisms are characterized by low abundance, diversity, and complexity, which make them difficult to detect.Therefore, it is necessary to explore novel methods for detecting pathogenic microorganisms in drinking water.
Molecular techniques such as PCR, quantitative real-time PCR and DNA microarray technology have been explored for pathogen identification and quantification in water [9].However, these molecular methods rely on specific oligonucleotide probes, and the known sequences of the microbes detected [10].Furthermore, the molecular techniques cannot detect enough pathogenic bacteria at a go, and are also unable to identify diversity and abundance of pathogenic bacteria in water.Recent technological advances in next-generation sequencing (NGS) offer better prospects for detection of pathogenic microorganisms and investigating their diversity [11].NGS can quickly generate huge amounts of DNA reads, and the technique is affordable [12].
In this study, the pathogens in a water source from East Tai Lake was analyzed in winter by Illumina Miseq sequencing of the PCR products of the V4+V5 region of the bacterial 16S rDNA gene.The sequences obtained were assigned to taxonomic ranks.

EXPERIMENTAL Study site and sample collection
Water samples were collected over a period of three months in winter (Dec 2013 to Feb 2014) from the same place (31º0'8.88"N,120º27'21.15"E) of the water source at a biologically activated carbon drinking-water treatment plant in Wu Jiang, at 8:30 every day.The samples were appropriately labelled with Dec 13, Jan 14 and Feb 14.For each sample, 10 L water was obtained under the surface of the lake (about 1.2 m) near the water inlet.The samples were put into sterilized glass bottles and stored in foam insulation box with frozen ice packs prior to transportation to the laboratory.The microorganisms were enriched from 2 L water sample by 0.22 µm tangential flow filter (TFF; Pellicon XL 50 cm 2 , Milipore) on sterile console at 4 o C, centrifuged at 5000 rpm for 5 min, and stored at -80 o C.

Water quality analysis
Physico-chemical characteristics of water, such as dissolved oxygen, temperature, pH, electrical conductivity and total dissolved solids were measured using portable rapid detection (HACH HQ30D and HANNA HI98130).Turbidity, total organic carbon, total nitrogen, total phosphate, and colony lump-sum were determined according to the standard methods.

DNA extraction and PCR amplification
Soil DNA kit (OMEGA, USA) was used to extract DNA under the guidance of the instructions in the kit.The treatment replicates were pooled, and the extracted DNA was diluted in TE buffer (10 mM Tris-HCl, 1 mM EDTA, pH 8.0) and stored at -20 °C.A Thermo NanoDrop 2000 spectrophotometer was used to determine the quantity and quality of the extracted DNA prior to 1 % agarose gel electrophoresis.The samples were selected for microbial community and pathogen analysis by 16S rDNA tag pyro sequencing, and in each sample, an aliquot of the extracted DNA was used as a template for amplification.The 16S rDNA genes were amplified by PCR using primers with universal 16S primer sequences 515F (GTGCCAGCMGCCGCGG) and 907R (CCGTCAATTCMTTTRAGTTT), targeting the V4+V5 hyper variable regions (about 392 bp) of the 16S rDNA gene.Amplifications were carried out in a total volume of 20 µL, containing 5×FastPfu buffer, 2.5 mM dNTPs, 5 μM forward primer, 5 µM reverse primer, FastPfu polymerase (TransGen AP221-02, China), and 10 ng template DNA.One step PCR was performed using a mixture of TransStart Fastpfu DNA polymerase and FastPfu DNA polymerase under the following thermal profile: 95 o C for 2 min, then 27 cycles of 95 o C for 30 s, 55 o C for 30 s, and 72 o C for 30 s, followed by one cycle of 72 o C for 5 min and 10 o C hold.The PCR products were put together by sample label, and subjected to 2 % agarose gel electrophoresis.Then the products were isolated with AxyPrep DNA nucleic acid purification kits (AXYGEN, USA), and quantified using a QuantiFluor™-ST (Promega, USA).

High-throughput pyro sequencing and sequence diversity analysis
Pyro sequencing was used to analyse bacterial diversity of the source water.An equal amount of the PCR product was mixed in a single tube and run on a Miseq PE300 pyro sequencing platform (Majorbio BioPharm, Shanghai).The sequences of double-ended DNA from Miseq PE300 pyro sequencing were merged into one sequence based on the overlap relationships between PE reads, using Seqprep (https://github.com/jstjohn/SeqPrep).
The quality of reads and the merged effect were inspected.
The minimum length of overlap was 15 bp, the maximum error ratio was 0.02, and the minimum matching rate was 0.9.The sequences were subsequently assigned to each sample with a bar code.The length distribution of valid sequences (301-500 bp) was 99.99 %, and the average length was 395.66 bp.At 97 % similarity level, the trimmed and unique 16S rDNA sequences were lumped into the operational taxonomic units (OTUs) [13] by applying RDP pyro sequencing pipeline.The results were stored at the short reads archive (SRA) database of National Centre for Biotechnology Information Search database (NCBI, http://www.ncbi.nlm.nih.gov) (accession number: SRP068068).

Visual analysis of the data of microbial diversity
The result of 16S rDNA blast was analysed using the visualization software Gephi (https://gephi.org)[14,15].After highthroughput pyro sequencing and sequence diversity analysis, a csv file was built, which could be received by Gephi base on the genera (source column), environment (target column) and abundance (the number of 16s rDNA column).Gephi can intuitively reveal patterns and trends, and highlight outliers without learning graph theory.The csv files were uploaded to the free cloud space (http://pan.baidu.com/s/1pKgzpof,share password: 8tub) and was freely accessed.The relationships between microorganisms and environment were shown by a graph layout algorithm Force Atlas 2 [16].

Establishment of 16S rDNA gene pool of pathogenic bacteria
16S rDNA is highly conserved and specific, and has become a means of pathogen detection.The 16S rDNA gene pool of the known pathogenic bacteria was established based on the virulence factor database (VFDB, http://www.mgc.ac.cn/VFs/main.htm)and NCBI database.Then the gene pool was uploaded to free cloud space (http://pan.baidu.com/s/1skgvvDr,share password: 784u), named pathogen_16s_rrna.fasta.The result of Miseq PE300 high-throughput pyro sequencing was blasted with the gene pool using NCBI's BLASTN [17].The sequences that were highly similar (> 97 %) to the sequences of 16S rDNA for standard pathogenic bacteria, with length > 395 bp and scores > 360 were identified from the blast result.

Relationship between pathogenic bacteria and environment factor
The influence of environmental factors on the abundance of pathogenic bacteria was investigated using RDA.This method can identify almost all biological information of pathogenic microorganisms and find out the environmental factors that influence them.The angle formed by the line connecting the dot (pathogenic bacteria) and the origin and the parameter rays shows the correlation between pathogenic microorganisms and environment factors.

Water quality
The physico-chemical parameters of the water source at the same site in three winter months are shown in Table 1

COD = chemical oxygen demand; TN = total nitrogen; TP = total phosphorus; AN = ammonia nitrogen; A = algae density
The chemical oxygen demand, total nitrogen and ammonia nitrogen were unstable in the three months.The algae density did not change significantly, but mean values of total phosphorus increased significantly (Table 2), implying that the organic content of the water increased with less precipitation in winter.The high turbidity may protect microorganisms in the water.

Microbial diversity and richness
To give a comprehensive insight into the microbial communities in the water source, 16S rDNA gene sequences of bacteria were amplified using universal primer pairs 515F/907R.A total of 144,292 high-quality bacterial sequence reads (≥200 bp, average length of about 395.66 bp) were generated from the PCR amplicons by Miseq PE300 pyro sequencing.
NGS results showed that the microbial communities included 16 different phyla: Verrucomicrobia, Spirochaetae, Proteobacteria, Planctomycetes, Nitrospirae, Lentisphaerae, Gemmatimonadetes, Fusobacteria, Firmicutes, Elusimicrobia, Cyanobacteria, Chloroflexi, Chlorobi, Bacteroidetes, Actinobacteria, and Acidobacteria.A few sequences could not be classified, such as BD1-5 and TM6.At the OTU level, the differences in relative abundances between samples are shown in Figure 1A.A total of 215 genera were obtained, and the relative abundance of microbial community was different in the three samples.The relative abundance of microbes in the Dec 13 sample was similar to that in the Jan 14 sample, but was different from that in Feb 14 sample (Figure 1B).Chloroplast norank, hgcI clade, Candidatus planktophila, Arcicella, Flavobacterium, and Methylotenera were the most dominant bacteria in the three samples (Figure 1C).

Visualized microbial diversity
The relationships between samples and bacterial survival were generated by the visual program Gephi, the visual figure upload of google drive, and the share URL: https://drive.google.com/file/d/0B9IKtmqDKUJKMHpENm9yLXdhblU/view?usp=sharing.There were 215 genera in the three samples, among which only 16 were in Feb 14 sample, 11 were in Jan 14 sample, 9 were in both Dec 13 and Feb 14 samples; 8 were in Dec 13 and Jan 14 samples, and 33 were both in Jan 14 and Feb 14 samples.The Gephi visual picture reflects the presence and belonging of a microorganism more intuitionally than heatmap picture.

Verified pathogen sequence
A pathogenic bacterial gene pool including 468 genera and 1608 integral 16S rDNA sequences was built for 16S rDNA blasting.These sequences were long enough (> 390 bp) to avoid errors caused by short sequences.The sequences obtained in this study were blasted by the pathogenic 16S rDNA gene pool generated in this study.The most similar sequences with the standards were selected (score > 340, similarity > 97 %, E-value < 10 -5 , and length > 395 bp) A total of 17 species of pathogenic bacteria, were obtained after blasting with 16S rDNA gene database (Table 3).These include Brucella suis, Clostridium botulinum B str., Legionella pneumophila Paris, Clostridium beijerinckii, Bartonella tribocorum, Salmonella, Enterica (serovarty phimurium), Bacillus thuringiensis, Serovar konkukian str., S. haemolyticus, M. smegmatis, P. fluorescens, S. agalactiae, Y.

Relationship between pathogenic bacteria and environment factor
Most environmental factors, including dissolved oxygen, pH, temperature, total phosphate, and algae density, had significant impact on pathogenic bacteria (Table 4).As shown in Figure 3, algae density was positively correlated with conductivity, and only a few pathogenic microorganisms were negatively correlated with the algae density and conductivity, such as C. botulinum B str and Bartonella tribocorum.Total phosphate was negatively correlated with temperature, and dissolved oxygen was negatively correlated with pH.

DISCUSSION
The present study analysed the physicochemical parameters as well as the biodiversity of water source from Tai Lake in winter.The results showed that the bacterial species were relatively stable and most of the bacteria were detected simultaneously in three water samples.The abundance of pathogenic bacteria changed with environment factors.The physico-chemical parameters investigated in this research were used to assess the quality of water from Tai Lake.From Dec. 2013 to Feb. 2014, the pH values were stable, ranging from 7.77 ± 0.05 to 7.84 ± 0.07, which are in accordance with the pH of natural waters.Natural waters have pH values between 6.5 and 8.5, which make them useful as drinking water [17].The temperature of the water source decreased from Dec. 2013 to Feb. 2014, in keeping with changes in weather temperature.However these changes do not affect the quality of drinking water.It was also found that with decrease in temperature, dissolved oxygen concentration increased.Higher concentrations of dissolved oxygen promote the growth of aerobic organisms, such as algae.Moreover, the instabilities in turbidity and chemical oxygen demand may be affected by the dredging of the Tai Lake.NGS analysis identified 144,292 high-quality bacterial sequence reads.After chimera check and removal of obviously erroneous reads, the validated bacterial reads were assigned into genus level taxonomic ranks with the RDP Classifiers.The 16S rDNA sequence in bacteria genome is highly conserved.The RDP Classifiers were used to assign gene sequences from the water samples, and affiliate them to 16 phyla and 215 genera.The relative abundance of microbial communities was different in the three samples, which may be due to the environment factors.Microbial diversity was shown in the Gephi visual picture which reflects the presence and classification of a microorganism more intuitionally than heatmap picture.Besides, Gephi can also reflect the relationship between microorganisms and environment.Gephi visual picture is considered suitable for the display of multiple samples of microbial species in the study of biodiversity [14].Therefore, 16S rDNA gene pool of pathogenic bacteria was built for blasting the 16S rDNA sequences got from sequencing.This revealed the pathogenic bacteria at the species level in the source water samples.It is difficult to understand the relationships between pathogenic bacteria and environmental factors due to the complexity and variety of environment factors.In the present study, the influence of environment factors on the abundance of pathogenic bacteria was investigated using the RDA.The abundance of some 16S rDNA sequences of pathogenic bacteria, such as B. tribocorum and C. botulinum B str changed with environmental factors.Total phosphate was negatively correlated with temperature.It has been demonstrated that total phosphate is associated with the eutrophication of water which is beneficial to the growth of pathogens [24].
In addition, it is well known that with decreases in temperature, the concentration of dissolved oxygen increases, which promotes the growth of pathogens because most pathogens are aerobic.Therefore, reduction in emissions of nitrogen and phosphorus, and control of eutrophication of water bodies in source water may be effective ways of reducing the risk of pathogenic microorganism leaking into drinking water.

CONCLUSION
Comprehensive the 16S rDNA sequences of bacteria and pathogens in water samples from Tai lake in winter, through NGS indicate that the pathogenic bacterial species were relatively stable in the lake in winter, suggesting a potential microbial risk in the water source.The abundance of pathogenic bacteria correlates with environmental factors.The methods used in this study to detect the potential pathogenic bacteria in the water source are more comprehensive than those used in previous studies.Thus, the results afford a reliable insight into the identity of pathogenic bacteria in this water source.

ASSOCIATED CONTENT
Supporting information: The additional information include the list of the common pathogenic bacteria of humans, animals and plants; the 16S rDNA gene database of common pathogenic bacteria with fasta file and the data source of Gephi with csv file.These materials are available free of charge via the internet at yun.baidu.comand drive.google.comwith the share-password.
pseudotuberculosis, L. welshimeri, Bordetella parapertussis, P. fluorescens and P. syringae pv.In sequences of NGS, 28,604 sequences may belong to the pathogens, accounting for about 19.82 % of total sequences.The abundance of the pathogenic bacteria and their similarities are shown in Figure2.B. suis, B. tribocorum and S. thermophilus were the most abundant species of pathogenic bacteria in the water samples.

Figure 1 :
Figure 1: Microbial diversity and abundance in water source from Tai Lake in winter.(A) Venn diagram; (B) relative abundance of microbial community, and (C) heatmap showed the diversity and abundance of microorganisms in the water, as well as the differences among the three water samples

Figure 3 :
Figure 3: RDA graph reflecting the relationships between the pathogenic bacteria and environment.(A) relationships between algae density, conductivity and pathogenic bacteria; (B) relationships between total phosphate, temperature and pathogenic bacteria; (C) relationships between dissolved oxygen, pH and pathogenic bacteria.Based on the Illumina Miseq PE300 pyro sequencing platform, so many sequences were obtained, making it difficult to identify the potentially pathogenic bacterial sequences.Although many pathogenic bacteria databases have been established, like NMPDR [18] , EuPathDB [19] and VFDB [20], they are difficult to blast on the result of 16S rDNA sequences of NGS.
After analysis, B. suis, B. tribocorum and S. thermophilus appeared to be the most abundant species of pathogenic bacteria in the water samples.B. suis is a common pathogenic bacteria.Some lesions caused by B. suis include lesions in reproductive organs and joints.In particular, several species may lead to severe diseases in humans [21].B. tribocorum has been isolated from rodents in different parts of the world in recent years.However, there is currently no evidence that B. tribocorum has a zoonotic potential [22].S. thermophilus is a major dairy starter used for the manufacture of yoghurt and cheese [23].

Table 1 :
and Table2.Physical quality parameters of the three water samples

Table 2 :
Biological and chemical water quality parameters of three samples

Table 3 :
Possible pathogenic bacteria after blasting by established 16s rDNA gene database