Analysis of Intronless Genes Involved in Oscillation and Differentiation

The genomes of higher eukaryotes are replete with intron-containing genes. Transcription of these genes produces precursor mRNAs containing intervening sequences, which are subsequently removed and the exons spliced together to form the mature mRNA. However, a small proportion of eukaryotic protein-coding genes are intronless and therefore bypass post-transcriptional splicing events. Although a large proportion of intronless genes are known to code for certain types of proteins, their specific role in the genome of higher organism is perplexing. This research set out to elucidate the functions of intronless genes in humans by studying their involvement in the expression pattern of oscillatory gene that occurs in the pre-somitic mesoderm of developing embryo. Twenty-seven (27) human homologs of mouse oscillatory genes were analysed to determine the number of exons present in them using various bioinformatics databases. The result obtained identified two intronless genes –NRARP and ID1 – which are associated with the Notch signalling pathway of the segmentation clock. This represented 7.4% of the total oscillatory genes analysed. No intronless gene was found in the Wnt and FGF signalling pathways – two other pathways famous for oscillatory gene expression. The proteins encoded by the intronless genes are involved in several important biological processes including angiogenesis, cell cycle control and in the regulation of cellular senescence. Although oscillatory genes had fewer numbers of introns compared to the non-oscillatory genes, the intronless genes were not implicated in the regulation of the precise timing events of the segmentation clock. This result may also point to the fact that the rapid expression rate of the oscillatory genes in the PSM may favour the reduced intron length of the oscillatory genes. DOI:https://dx.doi.org/10.4314/jasem.v25i9.1 Copyright: Copyright © 2021 Osemwenkhae and Aguebor-Ogie. This is an open access article distributed under the Creative Commons Attribution License (CCL), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Dates: Received: 09 May 2021; Revised: 12 August 2021; Accepted: 12 September 2021

The genomes of higher eukaryotes are replete with intron-containing genes. Introns comprise most of the genes in the human genome with an average of 8 -9 introns, representing the highest proportion in eukaryotes (Roy and Gilbert, 2006). Transcription of these genes produces precursor mRNAs containing intervening sequences (introns), which are subsequently removed and the flanking regions (exons) spliced together to form the mature mRNA. It is interesting to note that only about twenty (20) nucleotides are transcribed per seconds, at the rate of two ATP molecules per nucleotide. This implies that transcription is a slow and expensive process. Therefore, the presence of introns in highly expressed genes with long introns is kinetically costly (Castillo-Davis et. al., 2002). It has been observed that the presence of introns in highly expressed genes tend to constitute some drawbacks including slowing the efficiency with which a gene is produced, loss of energy due to the transcription, translation of the gene and subsequent splicing out of the introns as well as compromising the transcriptional fidelity due to differential/alternative and aberrant splicing of introns. However, intronless genes do not go through the expensive process of posttranscriptional splicing as this may explain why they have survived throughout evolution. Furthermore, the absence of introns may also enable these genes to be transcribed efficiently and with a potentially higher rate of protein expression (Gentles and Karlin, 1999). The proportion of intronless genes in the eukaryotic genome is generally thought to be less than 3% (Grzybowska, 2012), with some gene functional classes such as G Protein-Coupled Receptors (GPCRs), histones and interferon type 1 having the highest proportion of intronless genes (Doenecke and Albig, 2005;Gentle and Karlin, 1999). The developing embryo undergoes a plethora of differentiation processes.
During embryonic development, segmentation of the vertebrate body is achieved during somitogenesis and involves the periodic separation of small balls of epithelialized cells, known as somites, from the paraxial mesoderm strips. Somites, generated sequentially along the antero-posterior (AP) axis, derive from a growing mesenchymal tissue called the pre-somitic mesoderm (PSM). Somitogenesis is a rhythmic process that relies on the activity of a molecular oscillator known as the segmentation clock, which is used by the developing embryo to control body segment length and number (Schroter and Oates, 2010). This clock also drives the expression of some genes whose mRNA levels execute a dynamic expression sequence that is repeated in the PSM each time a new pair of somites forms. Somitogenesis involves the interplay of the Notch, Wnt and FGF signalling pathways and it is characterized by the rapid expression of cyclic genes (Aulehla and Herrmann, 2004;Pourquie, 2003;William et. al., 2007). It is interesting to note that genes with rapidly changing expression levels as well as constitutively expressed genes are poor in their intron content (Eisenberg and Lenano, 2003;Doenecke and Albig, 2005;Jeffares et. al., 2008). The expression of cyclic genes in the PSM is known to occur every 30 minutes in zebrafish, 90 minutes in chick and 2 hours in mice (Kageyama et. al., 2012). The precise timing of the process is highly essential for the maintenance of the oscillatory expression that drives the formation of somites and other structures. Investigations of this intronless gene class are crucial in understanding the advantages it confers on a gene. A comprehensive understanding of their roles is important to compare and contrast the functional features of both intronless and intron-containing human genes. The present study was therefore carried out to identify the intronless gene (and their targets) present in the human genome that exhibit cyclic mRNA expression pattern during the development of the embryo, with a view to determining their functions in the context of development.
Databases and software used: ENSEMBL database (release 72) was used to identify human homologs of mouse oscillatory genes and to determine which of the genes are intronless. OMIM (Online Mendelian Inheritance in Man), UniProtKB (Universal Protein Resource Knowledgebase) and BioGPS were used to determine the function(s) of the intronless genes identified. Internet Explorer 8 was used to access the internet while Microsoft Excel was used to collate, process and interpret the data generated. Microsoft Word was used to compile the results and also as a word processor.
Research design flowchart: In the first step, human homologs of established mouse oscillatory genes were identified. On the ENSEMBL homepage, the 'human' genome category was selected. The gene symbol for each oscillatory genes involved in the segmentation of the mouse PSM (e.g. Hes7) were individually used to query the entire human genome in order to obtain a match. The 'Gene ID' link for the match was used to obtain important details such as the Gene description, Gene location, Number of transcripts, transcript ID as well as the number of exons (coding regions) for each of the transcripts. An example using the mouse oscillatory gene, Hes7, is presented below in

Identification of human intronless oscillatory genes:
Genes that possess a single exon are classified as intronless (Doenecke and Albig, 2005;Grzybowska, 2012). On the ENSEMBL page, 'transcript ID' link was used to search each of the human homolog of the mouse oscillatory genes to determine the number of exons that are contained in their transcripts/splice variants. The two identified intronless genes were subjected to further analyse using OMIM, UniprotKB and BioGPS, in order to determine their functions in development and differentiation. The gene names were used to perform a search on the different databases using the default graphical user interface (GUI) settings. A list of genes containing the keyword/ gene symbol was displayed. The list contained a link to the entry with information on the gene of interest as well as further information on the chromosome location.

Determination and calculation of the length of intron:
The total intron content in base pairs (bp) was determined for each of the oscillatory genes by summing up all the introns contained in the coding regions. The 'exon' link on the ENSEMBL webpage is used to determine the length of introns and exons in a particular transcript.

RESULTS AND DISCUSSION
This study examined the presence and function of intronless genes in the segmentation oscillatory pathway. Twenty-seven (27) genes were analysed using various bioinformatics tools and the results were then grouped into cluster of genes depending on which pathway the gene is found.
Oscillatory genes associated with the notch signalling pathway: The result of the analysis of the human oscillatory genes associated with the Notch signalling pathway is presented in Table 1. transcripts, one HES1 transcript, four HEY1 transcripts, one each of ID2 and LFNG and two NKD1 transcripts are non-coding genes. Of the eleven genes investigated in this pathway, only two genes -NRARP and one transcript of ID1 (ID1-002)were found to be intronless. The result also shows that some of the transcripts of the oscillatory genes have no protein products.
Oscillatory genes associated with the wnt signalling pathway: This result shows that no intronless gene is present in the Wnt pathway. The ID prefixes, ENSGand ENST-represent ENSEMBL gene and ENSEMBL transcript respectively. The transcript names represent the number of alternatively spliced transcript variants of the gene. Table 2 displays the result of ENSEMBL analysis of the oscillatory genes associated with the Wnt signalling pathway. All nine oscillatory genes in this pathway are intron-containing genes of which only two genes -MYC and PHLDA1have alternatively spliced transcripts that are all protein coding. One transcript from AXIN2, CYR1, SP5 and TNFRSF19 genes, as well as two transcripts from DKK1 gene, were found to be non-coding transcripts.  Table 3. All seven oscillatory genes identified in this pathway were found to be intron-containing genes.
The result shows that one of the transcripts of DUSP6 and SHP2 genes do not code for proteins. The HSPG2 and BCL2L11 genes were found to have 16 and 17 transcripts respectively, representing the highest number of transcripts among all the genes investigated. Transcript HSPG2-001 has 97 exons, representing the highest number of exons among the genes investigated. Transcripts HSPG2-201 and PTPN11-001 were also found to contain 19 and 16 exons respectively. EGR1 was found to have one transcript while SPRY2 has three transcripts which are all protein coding.

Functions of the identified intronless genes:
The functions of the two identified intronless oscillatory genes are presented Table 4 below. The report of Zebede and Hara (2001) confirms the function of the ID1 protein in the process of cell cycle control and cellular senescence. The second intronless gene identified codes for a 12,492-Da (114 amino acid residues) protein known as Notch-regulated ankyrin repeat-containing protein (NRARP). Analysis using the UniProtKB database revealed that the protein possesses two ankyrin (ANK) repeats (positions 50-79 and 83-112) and is involved in the formation of somites. Furthermore, NRARP is involved in blood vessel endothelial cell proliferation during sprouting angiogenesis, negative regulation of Notch signalling pathway, negative regulation of T cell differentiation, patterning of blood vessels as well as regulation of cell-cell adhesion. These results are in line with the reports of Phng et. al. (2009) and Yun and Bevan (2003) implicating NRARP as the molecular link between Notch-and Wnt signalling in endothelial cells, to control stability of new vessel connection and also a major player in T cell development.

Correlation of the length of intron with gene function:
The intron length of the oscillatory and non-oscillatory genes involved in the three crucial pathways of somitogenesis were determined and compared (Fig 2,  3 and 4 respectively). For the genes associated with the Notch signalling pathway (Figure 1), the result revealed that all the oscillatory genes (with the exception of MAML3 & NKD1) contained fewer numbers of introns when compared to the nonoscillatory gene, RBPJ. In addition, the result obtained for the Wnt signalling pathway genes also show that the same trend. Furthermore, the same comparison was done for the genes associated with the FGF signalling pathway (Figure 3).   The result also show that the non-oscillatory genes found in this pathway contained more introns as opposed to the oscillatory genes. Taken together, it is clear from the charts that the oscillatory genes possessed fewer introns compared to the nonoscillatory genes. It is interesting to note that genes with rapidly changing expression levels as well as constitutively expressed genes are poor in their intron content (Eisenberg and Lenano, 2003;Doenecke and Albig, 2005;Jeffares et. al., 2008). The expression of cyclic genes in the PSM is known to occur every 30 minutes in zebrafish, 90 minutes in chick and 2 hours in mice (Kageyama et. al., 2012). In order to determine if the rapid expression levels of the oscillatory genes show any correlation with the intron length, this research also compared the intron length of the cyclic and non-cyclic genes involved in each of the three pathways. The results obtained suggest that oscillatory genes contained fewer introns compared to non-oscillatory genes. This is in agreement with earlier observations on the presence of short introns in rapidly expressed genes (Castillo-Davis et. al., 2002). This can be due to the fact that the presence of long introns in these rapidly expressed genes would slow the efficiency with which the genes are produced and also lead to loss of energy due to transcription, alternative splicing of the introns and the subsequent translation of the gene product.
Conclusion: Most eukaryotic genes are characterized multiple exons separated by introns of varying length. Investigations of this intronless gene class are crucial in understanding the advantage it may confers on a gene. In this research, we identified two intronless genes (NRARP and ID1)which are associated with the Notch signalling pathway of the segmentation clockrepresenting 7.4% of the total oscillatory genes investigated. The proteins encoded by the intronless genes are involved in several important biological processes including angiogenesis, cell cycle control and in the regulation of cellular senescence. However, both NRARP and ID1 are not involved in the maintenance of the precise timing events that occur during somitogenesis. Microarray studies covering a larger fraction of mouse genes combined with improved amplification techniques will enhance the identification of more oscillatory genes and their targets, thereby increasing the possibility of identifying more intronless genes in this important pathway.