Gene expression regulation by upstream open reading frames in rare diseases

Upstream open reading frames (uORFs) constitute a class of cis-acting elements that regulate translation initiation. Mutations or polymorphisms that alter, create or disrupt a uORF have been widely associated with several human disorders, including rare diseases. In this mini-review, we intend to highlight the mechanisms associated with the uORF-mediated translational regulation and describe recent examples of their deregulation in the etiology of human rare diseases. Additionally, we discuss new insights arising from ribosome profiling studies and reporter assays regarding uORF features and their intrinsic role in translational regulation. This type of knowledge is of most importance to design and implement new or improved diagnostic and/or treatment strategies for uORF-related human disorders.


Introduction
Over the past few years, many genome-wide studies [RNA deep sequencing, ribosome profiling (RiboSeq), mass spectrometrybased methodologies] pointed out translation as a major regulator of gene expression, being recognized as a key post-transcriptional mechanism by which cells rapidly change their expression patterns in response to a wide variety of stimuli [1][2][3][4][5][6][7][8][9] .RiboSeq is the most promising genome-wide approach to monitor in vivo translation, providing also new information about mechanisms of protein synthesis and its control 5,10 .
Translation is a tightly controlled process that comprises four different steps: initiation, elongation, termination and ribosome recycling.Translation initiation is the most regulated step of translation 11 .In eukaryotes, translation initiation starts with the recruitment of the cap-binding protein complex, namely eukaryotic initiation factor 4F (eIF4F), which comprises eIF4E, eIF4A and eIF4G, to the mRNA 5' end.The unwinding of the 5'UTR by the helicase eIF4A, enables binding of the 40S ribosomal subunit.The association of eIF1, eIF1A and eIF3 to the 40S subunit facilitates the binding of the ternary complex eIF2-GTP-Met-tRNAi.The resulting 43S preinitiation complex can land next to the cap and scans in a 5' to 3'direction until it recognizes an initiation codon base-pairing with methionine initiator-tRNA (Met-tRNAi).Upon recognition of the start codon, eIF5 stimulates GTP hydrolysis, resulting in the release of eIF2-GDP and probably of other 40S-bound initiation factors.eIF5B catalyzes the joining of 60S subunit to form an 80S ribosome, and elongation can start (reviewed in 11 ).
There are several cis-acting elements involved in the regulation of translation initiation, for instance, internal ribosome entry sites (IRESs) and upstream open reading frames (uORFs) 1,11 .IRESs are highly structured RNA sequences that allow the recruitment of the 40S ribosomal subunit directly to the initiation codon or to its vicinity, promoting translation initiation via a cap-independent mechanism 12,13 .On the other hand, a uORF is defined as a sequence beginning at an initiation codon, within the 5' untranslated region (5'UTR) of a transcript, in frame with a termination codon positioned upstream or downstream (overlapped uORF) of the main ORF initiation codon 1,14,15 .uORFs, the most abundant and the best understood class of small ORFs (sORFs), are sequences that encode for peptides up to 100 amino acids, and play different biological roles in the cell 16,17 .uORFs are typically described as repressors of translation initiation at the main ORF [18][19][20][21][22] .These cisregulatory elements are prevalent genome-wide being estimated that approximately half of the human transcripts contain at least one uORF, and many of them are conserved among species, suggesting an evolutionary selection of functional uORFs 18,[23][24][25][26][27] .Genes that need a highly controlled translational regulation, such as oncogenes and genes involved in cell growth, differentiation, development and stress response, are the typical classes of genes harboring uORFs 1,14,23 .Thus, it is easy to understand that mutations or polymorphisms that disrupt, create or modify uORFs can potentially be associated with the development of several disorders, including rare diseases 1,14 .Additionally, uORFs and IRESs in the same transcript can cooperate to regulate protein synthesis, although with an antagonist effect 12 .For example, translation of fibroblast growth factor 9 (FGF9) is repressed by a uORF in physiological conditions, and induced by a IRES in hypoxia 28 .
By including the new contributions from RiboSeq analyses and reporter assays, here, we review the mechanisms about uORF-mediated translational regulation, and show how their deregulation can cause human rare disorders.

uORFs and translational regulation
For a uORF to function as a translational regulator its initiation codon needs to be recognized.This process requires the recruitment of the 43S pre-initiation complex (PIC) to the mRNA 5'-cap that, as mentioned above, allows the scanning of the 5'UTR and the recognition of the upstream AUG (uAUG) codon to start translation (reviewed in 11 ).When the ribosome reaches the uORF stop codon, it can either: (i) dissociate and be recycled, which induces translational repression of the downstream ORF(s), or (ii) the 40S subunit does not dissociate from the mRNA and is able to reinitiate translation at a downstream initiation codon 1,14,22,25,29 .Translation of a uORF may also trigger nonsense-mediated mRNA decay, if the uORF stop codon is recognized as a premature translation termination codon 1 .
In these circumstances, translation reinitiation at the main ORF cannot occur 1,30 .An efficient translation repression mediated by translatable uORF(s) is positively correlated with: (i) a strong uAUG context, (ii) a large distance from the 5' cap to the uAUG, (iii) a great number of uORFs, (iv) a long uORF, and (v) a short distance between the uORF and the main coding sequence (CDS) 23,[31][32][33] .Additionally, the uORF-encoded peptide can exert an inhibitory effect in translation through a potential interaction with the translational machinery by stalling the translating ribosomes in a sequence-dependent manner or/and in an indirect way through interactions with other small molecules [34][35][36] .Moreover, the uORF-encoded peptides can have additional biological functions in the cell, working as trans-regulatory factors 37 .
The cell microenvironment influences the recognition of the AUG initiation codon by the 43S PIC 11 .During stress conditions, such as hypoxia, endoplasmic reticulum (ER) stress or nutrient depletion, eIF2α subunit is phosphorylated at serine 51 (eIF2α-P) by specific kinases [38][39][40] .This phosphorylation prevents eIF2 recycling by the guanine nucleotide exchange factor eIF2B, thus impairing the formation of the ternary complex and reducing the global rate of translation as part of the cell response to stress 11,31,41 .However, facing this global translational repression a group of transcripts escape and increase their translational rates via uORF-mediated mechanisms, specifically the ones that are involved in cell stress-response 3,32,[38][39][40][41] .In the context of high levels of eIF2α-P, uORFs are usually bypassed by the scanning ribosome that will then access the main AUG initiation codon 31 .This phenomenon is called leaky scanning and is responsible for allowing expression of, for instance, proteins involved in the ER stress response, like the growth arrest and DNA damage-inducible protein (GADD34) and the C/EBP homologous protein (CHOP), which are encoded by mRNAs with two and one uORFs, respectively 3,20,21 .The leaky scanning process in these two transcripts is mainly potentiated by a weak uAUG sequence context 3,20,21,32,42 .
In other cases, translation reinitiation at the main ORF occurs after translation of at least one uORF.The yeast general control protein (GCN4) and the activating transcription factor 4 (ATF4), with four and two uORFs in their 5'UTRs, respectively, are good examples of transcripts encoding stress related-proteins whose expression depends on translation reinitiation 22,43 .In amino acid starvation conditions, the first uORF of GCN4 is efficiently translated, but low levels of available ternary complex force the ribosome to bypass the other uORFs, granting the time to acquire a newly formed ternary complex and reinitiate translation at the main initiation codon 43 .In the case of ATF4, only the first uORF is translated in stress conditions and, again, the ternary complex will eventually be formed Changes in the expression profiles in response to stress conditions can be a consequence of a transition of the translational machinery to different subsets of mRNAs harboring uORFs or other cis-regulatory elements.This phenomenon was also hypothesized to be due to an intrinsic regulation of the ribosome 49 .In fact, an increasing number of evidence point out the heterogeneity in ribosomal protein composition (riboproteome), ribosomal RNA (rRNA) and ribosomal-associated co-factors depending on localization, cell type and environmental conditions, as a new layer of complexity in gene expression regulation [50][51][52] .Furthermore, even little variation in the core of ribosomal proteins seems to regulate translation of specific mRNAs by direct interaction with specific cis-regulatory elements within the 5' and 3'UTRs 49,53 .An example of this regulatory mechanism was described in Arabidopsis thaliana, where a ribosomal protein plays a critical role in translation reinitiation of polycistronic mRNAs and of mRNAs harboring uORFs 54 .This raises the possibility that, similarly to what happens in plants, ribosomal proteins can modulate uORF-mediated translation in human mRNAs and once more, alterations on those mechanisms may be disease-associated, being a promising field of study in the future.

uORFs and human rare diseases
When modified, disrupted or created due to mutations or polymorphisms, uORFs can deregulate the downstream main ORF expression and hence be the cause of several pathologies that can include metabolic, hematologic, endocrine and neurodegenerative disorders, and susceptibility to cancer 1 .Among them, numerous rare diseases can be found.According to the European legislation, a disease is considered rare when it affects up to five people per 10000 55 .Several examples of deregulated uORFs associated with the onset or development of rare diseases were well reviewed by Barbosa and co-workers 1 .Meanwhile, other cases have been described (Table 1), highlighting and reinforcing the impact of uORFs in mediating translational regulation in human health and disease [56][57][58][59][60][61] .
Belonging to the set of rare diseases associated with by the time the ribosome reaches the initiation codon of the main ORF 22 .
The advent of RiboSeq brought new insights about translational regulation, allowing the large-scale identification of mRNAs harboring uORFs as observed by the higher ribosome occupancy at the 5'UTR of many transcripts 5,10,44,45 .Interestingly, a great number of non-AUG initiation codon-carrying uORFs has been identified, a feature that was not possible to be computationally predicted before RiboSeq has emerged.Ribosome occupancy patterns revealed that CUG is the most prevalent non-AUG initiation codon in uORFs 5,45,46 .These analyses also revealed a positive correlation between mRNAs carrying non-AUG uORFs and their main ORF translation 47 .This new information raises the question about the initiation factors that regulate the recognition of a non-AUG initiation codon in conditions of an overall repression of translation by eIF2α phosphorylation.eIF2A seems to act as an alternative to eIF2α-P since it allows the recruitment of a leucine-tRNA (Leu-tRNA) to the vicinity of a CUG (and UUG) initiation codon of a uORF to start its translation.eIF2A is functionally different from eIF2α: its knockdown does not repress global translation but markedly impairs expression of uORFs containing leucine-initiation codons 37,45,48 .Thus, the depletion of eIF2A compromises uORF translation and consequently the main coding sequence expression of transcripts encoding stress-responsive proteins.Binding immunoglobulin protein (BiP) mRNA, which encodes a chaperone involved in the ER stress recovery response, was recently described to be regulated by this mechanism of uORF-mediated translation.BiP has two leucine initiation codons at position -190 (UUG) and -61 (CUG) nucleotides upstream the main AUG.Upon ER stress (and hence eIF2α phosphorylation), eIF2A-dependent translation of the -190 UUG uORF is essential for the BiP main ORF expression 48 .The same mechanism operates in mRNAs encoding proteins implicated in the tumorigenic process.In fact, eIF2A was directly associated with tumor formation in mice with squamous cell-carcinoma (SCC) xenografts, a phenomenon that was accompanied by the eIF2A-dependent translation of a cohort of cancer-associated transcripts 45 .

Rare disease Gene Pathogenesis Reference
Familial DOPA responsive dystonia (DRD) GCH1 The c.-22C>T polymorphism in the 5'UTR creates an out-of-frame uORF that reduces the main ORF translation efficiency; additionally, the 73-amino acid uORF-encoded peptide has cytotoxic effects.

56,57
Complete androgen insensitivity syndrome (CAIS) AR The c.-547C>T mutation in the 5'UTR creates an out-of-frame uORF that reduces the main ORF translation efficiency.58

Acampomelic campomelic dysplasia (ACD) SOX9
The c.-185G>A mutation in the 5'UTR creates an out-of-frame and overlapped uORF that reduces the main ORF translation efficiency.59 Multiple endocrine neoplasia syndrome type 4 (MEN4) CDKN1B A 4bp deletion in the uORF shifts its termination codon and impairs translation reinitiation at the main ORF. the creation of a uORF is complete androgen insensitivity syndrome (CAIS), part of the group of sex developmental disorders.CAIS is characterized by low levels of androgen receptor (AR) that impairs the response to the androgen dihydrotestosterone, compromising the male phenotype 62,63 .
In 1994, when studying the regulation of AR expression, a group of investigators showed that the 5'UTR is involved in its translational regulation and hypothesized that mutations in this 5'UTR could explain, in part, the etiology of androgen insensitivity syndrome, since the AR mRNA levels are maintained despite the reduced protein levels 64 .
Only in 2016, identification by next generation sequencing of a germline mutation in the 5'UTR of AR gene (c.-547C>T)proven to create a translatable uORF responsible for the low AR protein levels in CAIS was possible.In the mutant AR transcript, the 43S preinitiation complex recognizes and initiates translation at the uAUG, leading to the formation of a small peptide.Then, the ribosome dissociates, promoting low rate of translation reinitiation at the main ORF 58 .Other examples of rare diseases that can be related to the creation of uORFs include familial DOPA responsive dystonia (DRD) and acampomelic campomelic dysplasia (ACD) 56,59 .In the first case the polymorphism c.-22C>T in the 5'UTR of the human guanosine triphosphate cyclohydrolase 1 (GCH1) gene creates an out-of-frame uORF that encodes a 73-amino acid peptide, impairing the main ORF translation 56,57 .The subsequent low expression of GCH1 impairs the dopamine biosynthesis pathway that, ultimately, results in reduced levels of dopamine and dopaminergic dysfunction in the brain, typical of DRD.In addition, the synthesized 73-amino acid peptide is localized in the nucleus where it promotes cytotoxic effects that are accentuated by proteasome impairment 57 .Regarding ACD, a de novo mutation, c.-185G>A, in the transcription factor SRY-box 9 (SOX-9) gene creates an overlapped uORF that reduces SOX-9 translation that is responsible for the ACD, the milder phenotype of CD 59 .
A new example of a disease which phenotype is related to mutations that disrupt uORFs is multiple endocrine neoplasia syndrome type 4 (MEN4).A 4 base pair (4bp) deletion in the sequence of the highly conserved uORF of the cyclin dependent kinase inhibitor 1B (CDKN1B) gene was reported to lengthen the uORF by shifting the uORF stop codon and to reduce the intercistronic space.This event seems to prevent translation reinitiation at the main ORF and therefore the expression of p27 KIP1 , a tumor suppressor with a crucial role in cell cycle and proliferation regulation that when downregulated increases susceptibility to tumor development 60 .
In addition to alterations that create or disrupt uORFs in a disease context, there are genetic alterations that can indirectly impair the uORF-mediated translational regulation.As an example there is Shwachman-Diamond syndrome (SDS), a rare congenital disease caused by mutations in the Shwachman-Bodian-Diamond syndrome (SBDS) gene 65 .This disorder is a ribosomopathy since defective SBDS protein impairs large ribosome subunit maturation 61,66 .Due to its function, SBDS can regulate translation of other transcripts, such as CCAAT/enhancerbinding protein-α (C/EBPα) and -β (C/EBPβ), involved in granulocyte differentiation 61,67 .Both transcripts have alternative initiation codons that result in three different N-terminal protein isoforms: extended, p42 and p30 for C/ EBPα, and LAP * , LAP and LIP for C/EBPβ 14,61,67 .Low levels of SBDS expression impair the translation of C/EBPα-p30 and C/EBPβ-LIP truncated proteins, which can explain the hematological phenotype of SDS consisting of bone marrow failure with neutropenia 61,67,68 .C/EBPα and C/EBPβ mRNA have a uORF within their 5'UTR and Kyungmin and coworkers showed that SBDS is crucial for uORF-mediated translation reinitiation of C/EBPα-p30 and C/EBPβ-LIP.Thus, although the origin of SDS does not depend on alterations in the uORF sequence, the hematological picture is determined by disruption of a uORF-dependent translational mechanism 61 .This prompts us to look not only for the uORF sequence context but also to understand the processes, factors and/or networks that drive uORFmediated translational regulation.

Conclusions
Growing evidence from RiboSeq analyses and reporter assays have brought new insights about the existence of uORFs and their translational regulatory functions, reinforcing the importance of these cis-acting regulatory elements in the pathophysiology of several human disorders, including rare diseases.In addition, recent studies have revealed many uORFs harboring non-AUG initiation codons and alternative mechanisms of translation initiation associated with pathological conditions.These new data emphasize the importance of understanding the detailed molecular mechanisms through which a disease relies on in order to develop and implement new strategies for disease diagnosis and treatment.

Table 1 .
Recently described cases of human rare diseases associated with deregulated uORF-mediated translation.