RNU4-2: the small gene with a very big impact

Written by Nicky Whiffin


Recently, we published a paper in the international scientific journal Nature describing that genetic variants in the RNU4-2 gene are a prevalent cause of undiagnosed neurodevelopmental disorders.

The aim of this blog post is to break down the important aspects of this work for those who are not used to reading academic papers or who may not be that familiar with genetics, as well as to highlight some significant implications of this discovery.

What are neurodevelopmental disorders?

The term neurodevelopmental disorders (NDDs) is used to collectively describe conditions that develop in early childhood and that involve changes to how the brain functions that impact learning, behaviour, speech, and movement. Individuals with neurodevelopmental disorders can develop seizures and organs other than the brain can also be affected. Around one in every 200 children that are born have a severe neurodevelopmental disorder.

Why do some individuals have NDDs?

Most neurodevelopmental disorders are genetic, that is they are caused by changes to DNA. DNA is made up of building blocks called ‘nucleotide bases’. There are four DNA bases which can be abbreviated to the letters A, C, G, and T. The full sequence of our DNA is over three billion of these letters long. A simple change to this sequence (called a ‘variant’), such as changing one letter or base (from a T to a G for example) or adding or removing a single base, can cause a neurodevelopmental disorder. DNA changes that cause a genetic condition are called ‘pathogenic variants’.

A genetic variant is a change in the DNA sequence. This example shows a change in the sequence of letters where one of the Ts has been changed to a G.

When a child is thought to have a neurodevelopmental disorder, they are often offered genetic testing, where their DNA is analysed to try to identify the specific genetic variant that causes their condition. This analysis is not straightforward as each of us has millions of genetic variants (they are what makes us different from each other); and so pinpointing the exact DNA change that is responsible is challenging. If neither of the child’s parents have the condition, the search focuses in part on differences in the child’s DNA compared to their parents. We call these DNA changes that appear in a family for the first time ‘de novo variants’.

Each of us has many genetic variants, including around 70-80 of these de novo variants, and yet most of us do not have neurodevelopmental disorders. It is only when these variants occur in particularly important parts of our DNA that they cause a genetic disorder. Many of these important regions are called ‘genes’. Our genes are the instructions for making the molecules that have important roles in our cells. We have tens of thousands of these genes and DNA changes in hundreds of them are known to cause developmental disorders. Most of these disorders are very rare.

For around 60% of individuals with neurodevelopmental disorders, the genetic cause of their condition is not identified after genetic testing. These individuals are said to be ‘genetically undiagnosed’.

What did we find?

In this paper, we identified that de novo variants in a gene called RNU4-2 cause a neurodevelopmental disorder that had not been described before.

This finding enabled us to identify the genetic cause of disease in a subset of individuals who were previously genetically undiagnosed. We included 115 of these individuals in the paper.

What is special about RNU4-2?

A subset of our genes (around 20,000) are instructions for making proteins. Nearly all of the genes in which DNA changes are known to cause neurodevelopmental disorders are these ‘protein-coding’ genes, and these genes receive the most attention in research. To make a protein, each gene is first copied into RNA (through a process called transcription), before that RNA is used as the template (mRNA) to make protein (through a process called translation). 

But there are also tens of thousands of genes that are not protein-coding. These genes are copied into RNA, but that RNA is not a template to make a protein. Instead, these RNA molecules, which are collectively termed ‘non-coding RNAs’, have functions in our cells and organs. RNU4-2 is one of these genes. It is the instructions to make a small RNA called U4. RNU4-2 is only 141 DNA bases (or letters) long whereas most protein-coding genes are thousands of bases long. 

Genes are instructions for making molecules. Transcription is the process by which DNA is copied into RNA. To make a protein, the ‘messenger’ RNA (mRNA) is then translated.

The other striking thing is the number of individuals who have neurodevelopmental disorders caused by variants in RNU4-2. I was asked by Ian Sample at the Guardian newspaper (who covered this story here) “is it unusual to discover a new developmental disorder?” to which I replied “no, but it is unusual to identify one that is this common”. We estimate that DNA changes in RNU4-2 could be responsible for one in every 250 neurodevelopmental disorder diagnoses. This suggests that more than one thousand people in the UK and more than a hundred thousand individuals worldwide could have a neurodevelopmental disorder caused by variants in RNU4-2.

But if it is so common, then why did we not already know about it?

To look for genetic variants in an individual we use a technology called ‘DNA sequencing’. Put simply, this process allows us to look at the exact sequence of the DNA in each individual (i.e. to determine the sequence of the letters A, C, G, and T). 

It is expensive to look at every single base in a person’s DNA, so we often only look at parts of it. The parts of DNA that are the protein-coding genes are often considered as the most important and consequently the most likely place to find DNA changes that cause disease. These protein-coding regions make up only around 1.5% of our total DNA. We can look at the DNA sequence in these regions using an approach called ‘exome sequencing’ (sometimes called ‘whole exome sequencing’ or ‘WES’). Previous research studies (such as the Deciphering Developmental Disorders, or DDD study) have used exome sequencing to look for DNA changes in protein-coding genes that cause developmental disorders. This approach has been very successful: by looking at only 1.5% of the DNA sequence we can find the disease-causing genetic variant in around 40% of all individuals [1]. But, exome sequencing misses variants in non-coding genes, including RNU4-2. Moving forward, it is crucial to offer genetic testing that will capture this diagnosis to individuals with neurodevelopmental disorders.

So how did we discover this now?

Increasingly, ‘genome sequencing’ (sometimes called ‘whole genome sequencing’ or ‘WGS’), which looks at every base of DNA, is being used for individuals where a genetic diagnosis has not been found using exome sequencing. In some countries, including in the UK, genome sequencing has even replaced exome sequencing for individuals with neurodevelopmental disorders. About ten years ago, the Genomics England (GEL) 100,000 genomes project was established as one of the first large-scale genome sequencing projects. It recruited individuals with rare conditions or cancer from the UK National Health Service (NHS). This included over 12,000 individuals with neurodevelopmental disorders, many of whom (about 9,000) didn’t yet have a genetic diagnosis when we started our study (in early 2024). For a large proportion of these individuals, the project also has genome sequencing data from both of their parents, which means we can look for those de novo variants. We decided to focus on de novo variants in non-coding genes, thinking that variants in some of these genes might cause disease.

De novo variants are ‘chance’ events that happen randomly in a person’s DNA. We therefore rarely see the exact same variant in two individuals. If two variants happen to occur in the same gene and disrupt it in a similar way, then they can cause the same condition, but we mostly expect the exact pathogenic variants in a gene to be different in each individual. We were surprised, therefore, when we spotted the exact same de novo variant in 46 individuals. We initially assumed that it might be a mistake or error (i.e., not a real variant) associated with either the sequencing technology or the computational tools that are used to identify variants (neither are perfect). But if it was an error, we would expect it to occur at random in individuals across the dataset; we would expect those 46 individuals to have different disorders.

But all 46 individuals had neurodevelopmental disorders, and did not yet have a genetic diagnosis. 

We did not see the variant in around 55,000 individuals in the Genomics England database who did not have a neurodevelopmental disorder, or in any individuals who already had a genetic diagnosis. The fact that this variant was found solely in individuals with undiagnosed neurodevelopmental disorders is incredibly unlikely to happen by chance.

This variant is referred to as RNU4-2 n.64_65insT. This means that it is an addition (or ‘insertion’) of a T between positions 64 and 65 in the sequence of the RNU4-2 gene.

This variant is in an important part of the RNU4-2 gene

After spotting this variant, we looked more closely at the rest of the RNU4-2 gene. We noticed that there were other variants very close to the initial variant in other individuals with undiagnosed neurodevelopmental disorders. In total, we found 61 individuals in the Genomics England 100,000 Genomes Project with variants in a region in the middle of the gene that is 18 bases long. We noticed that variants in this small region almost never occur in healthy individuals. 

Through global collaboration, including researchers and clinicians in the US, Europe, and Australia, we very quickly identified 54 more individuals, all with variants in this same 18 base long region of the RNU4-2 gene.

We collated all of the clinical information we could access for each of these individuals, including where possible contacting clinical teams involved in the care of each patient, to look at clinical features that were shared among the patients. We found that all of the patients had intellectual disability and delayed development. Many were short in stature, had a smaller-than-average head circumference, difficulties feeding, reduced muscle tone, seizures, reduced or absent speech, and brain anomalies. 

Why do variants in this gene cause this neurodevelopmental disorder?

The RNU4-2 gene provides the instructions to make a small nuclear (sn) RNA called U4, or sometimes U4 snRNA. This U4 snRNA is involved in a process called ‘splicing’. The DNA sequence of most genes is very long and not all of it is used to make the molecule (either protein, or non-coding RNA) that the gene encodes. After the gene is copied into RNA, the sections of the RNA that are not needed (called ‘introns’) are removed and the remaining sections (called ‘exons’) are stitched back together. This process of chopping out and stitching together is called ‘splicing’. The U4 RNA is one of many proteins and RNAs that are needed for splicing to happen correctly.

U4 is part of a specific group of RNAs and proteins, that are together called the spliceosome, that are responsible for recognising one end of each intron (specifically the end closest to the start of the gene). The variants in RNU4-2 result in the wrong position sometimes being used at the end of the intron causing extra DNA bases to be added to or removed from the final RNA template (the messenger RNA, or mRNA). As proteins are made from these mRNA templates, this can result in important proteins being incorrectly made or in some proteins not being made at all. We don’t know yet exactly which genes are incorrectly spliced or how often, but we expect that many will be templates for proteins that are important in development.

Splicing is the process through which introns are removed from RNA. It is mediated by a large collection (or ‘complex’) of proteins and non-coding RNAs, called the spliceosome. U4 snRNA is one of the molecules that makes up the spliceosome.

How does this new knowledge benefit patients and their families?

Since initially reporting our findings in a preprint article (before the publication in Nature referred to at the start of this article), we have heard from many families about the impact of an RNU4-2 diagnosis. For many, this has been an end to a long ‘diagnostic odyssey’ where they have had many clinical appointments and diagnostic tests, often including genetic tests which had not looked for changes in the RNU4-2 gene. Some may have started to believe they would never know the genetic cause of their child’s neurodevelopmental disorder. For many families a genetic diagnosis is a huge relief and acknowledgement that they are not to blame for their child’s delays or features and it wasn’t anything they did before or during pregnancy that caused the disorder. For others it can help them and other family members make a decision to have another child; as the condition is de novo which means it hasn’t been inherited from either parent, the chance of the condition happening again is extremely low. But perhaps the biggest impact families have experienced so far is the opportunity to connect with other families, to realise they are not alone in this journey, and to learn how the future may look for their child.

“So, this feeling that like we’ve been on this deserted island for eight years and now all of a sudden, you’re sort of looking around through the branches of the trees. It’s like, wait a minute, there are other people on this island and in this case actually there's a lot more people on this island.”

Quote from Lindsay Pearse from the Behind the Genes podcast produced by Genomics England.

This discovery also provides hope for the development of treatments for this condition in the future. There are still a lot of questions to be answered, and there is a long road ahead, but knowing the genetic cause of this disorder is the first step along this road. There is now the opportunity to initiate new research, both into potential treatments, but also into other aspects of RNU4-2 biology, that could help many patients in the future.

“After a lonely 13-year diagnostic odyssey searching for the cause of my son’s medical challenges, learning about this discovery has been life-changing! One parent beautifully described this diagnosis as the light that has illuminated our journeys and that’s exactly how I feel. This has opened the door for treatments to be developed and provides new hope. We can finally build a united community, raise awareness and help advance research toward a brighter future for our children.”

Quote from RNU4-2 mum Jessica who set up a Facebook group to connect families with an RNU4-2 diagnosis.

What does this mean for patients and families who are still without a genetic diagnosis?

This is not the first time that variants in a non-coding gene have been identified as the cause of a rare condition. Other examples include the snRNA genes RNU12 [2] and RNU4ATAC [3]. Conditions caused by variants in these other genes are, however, incredibly rare. This is not the case for variants in RNU4-2. We hope that this discovery will encourage researchers to look beyond protein-coding genes and will lead to identification of other conditions caused by changes to non-coding genes.

Where can I get more information?

If your family has received a RNU4-2 diagnosis you can get support from Unique, or join the RNU4-2 Family Connect group on Facebook.

Families, researchers, clinicians and anyone else anyone else interested in following developments in RNU4-2 research can follow the RNU4-2 United Informational Page on Facebook.

You can read our Nature paper here, and a commentary piece on it here.

Our work was covered by Ian Sample in the Guardian and has been written up here.

References:

[1] Wright CF et al. Genomic Diagnosis of Rare Pediatric Disease in the United Kingdom and Ireland. NEJM 2023

[2] Xing C et al. Biallelic variants in RNU12 cause CDAGS syndrome. Human Mutation 2021

[3] https://omim.org/entry/601428

All figures in this post were created using BioRender.com

Next
Next

Rare disease day 2022