Sequencing 100,000 human genomes

100,000 human genomes

A world-leading project to sequence 100,000 human genomes could get thousands of families the diagnosis they need. Genomics England’s Chief Scientist tells Sarah Kidner more.

This article was originally published in 2015, when Professor Caulfield was interviewed. It was updated in February 2019.

“The human genome is the blueprint for the people that we are,” says Professor Mark Caulfield, Chief Scientist for Genomics England.

More importantly, our genomes contain vital information about our susceptibility to certain conditions, including cardiovascular disease, cancer and rare diseases.

“The genome contains about 3.3 billion ‘letters’, and in every 300 or so there’s a change to one that can make us more susceptible to a disease or, if passed on from generation to generation, can cause us to inherit specific diseases,” explains the professor. “Sometimes there will be extra bits, or bits missing, and periodically bits will be moved around or not in the right place.”

Professor Caulfield spearheaded an ambitious programme that worked to sequence 100,000 genomes, which completed in late 2018. The goal was to identify the specific changes responsible for triggering both rare diseases and more common ones, such as cancer and heart disease.

Our genomes contain vital information about our susceptibility to certain conditions, such as cancer and heart disease.

This information allows us to tell thousands of families why their loved ones are susceptible to as-yet-undiagnosed conditions. One in four of those who had their genomes sequenced in the project were given a diagnosis for the first time as a result. “It’s about getting information to patients, mums and dads, and families who can’t get it right now from currently available technologies in the health service,” says Professor Caulfield.

The UK is uniquely placed to lead on a project of this scale. “There is an opportunity for Britain, with a unified healthcare system, free at the point of delivery, to transform the application of genomics in medicine,” says Professor Caulfield.

“We have reached a point where developments in technology and chemistry allow us to sequence an entire human genome in a few days for the relatively low cost of circa £1,000. The Human Genome Project revealed there were only about 20,000 genes that code for the proteins that make us who we are – about the same as a starfish. The role of the remainder, around 95 per cent of it, was a mystery.“

“We now know that the remaining DNA plays a critical role in determining how and when these proteins are produced, which is why it’s important to sequence the entire genome.”

Rare discoveries

Sequencing the whole genome of people with a rare disease could help identify atypical variants in the genetic code that actually cause diseases. But these rare variants are hard to find.

“By a rare variant, we mean a genetic variant in your genome that occurs in less than one per cent of the population, and sometimes even more rarely than that,” says Professor Caulfield. “Each of us has many, many rare variants. Getting a clear line of sight on the ones that cause rare inherited heart conditions, for example, can be very challenging, so we need some help.”

100,000 human genomes That help came from families who have a rare and undiagnosed disease, ideally two parents with an affected child, allowing us to track the rare variant through generations. Participants came via the NHS and enrolled in the programme through NHS Genomic Medicine Centres.

The first wave of these was announced in December 2014. “These centres will offer whole genome sequence to patients with rare inherited diseases who have not obtained genetic diagnoses from existing tests in the NHS. They will have time to think about it before they enrol, with informed consent,” says Professor Caulfield.

Typically, the definition of a rare, inherited disorder is one that affects five people per 10,000. For the 100,000 Genomes Project, the definition was broader. “I decided when we started that we wouldn’t confine rare diseases to that definition, because it wouldn’t include less rare but very important disorders such as familial hypercholesterolaemia (FH). This affects about one in 250 people and is a major cause of premature heart attacks,” says Professor Caulfield.

The project has observed disorders that affect the heart muscle, including cardiomyopathies and rare disorders of heart rhythm. Professor Caulfield explains: “The mums and dads that are enrolling their children in this programme know that only by understanding the genetic basis of rare diseases do we have much hope of designing better treatments for them.”

Facts and figures

100,000 whole genomes from NHS patients were sequenced, completing in late 2018

4 base components make up our DNA

220GB the amount of data occupied by a single genome

3.3 billion letters in a single human genome

5,000 to 8,000 the estimated number of rare disease

1 in 250 people are affected by familial hypercholesterolaemia

100,000 sequences

Some parts of the genetic code can be difficult to read, so it can take over 30 times to interpret one successfully. “Reading the genome once is not enough, because you might miss bits,” says Professor Caulfield. “It’s like reading a book periodically; you get to a sentence and think, ‘How did I get there?’ You have to go back and read a bit again because what you’re reading now doesn’t make sense. You realise you have missed a bit. It’s the same reading a gene sequence.”

Once read, the genomes are added to a databank. Each genome generates around 220GB of data (the storage capacity of 14 average iPhones), so a special data centre was built to store them, using a £24m investment from the Medical Research Council. The data, Professor Caulfield explains, is in two parts.

“One will retain an identity, because it’s important we can feed back to patients. A second data centre will store data in a non-identifiable format. We often call this anonymised or pseudonymised data. That data store will be 30 petabytes in size and will have about 30,000 dual processors.”

Only by understanding the genetic basis of rare diseases do we have much hope of designing better treatments.

Scientists and clinicians have access to the raw data, or the genome sequence, as it comes out of the machine that captured it. The newly captured genomes are aligned to a control genome, ensuring the genome is reassembled in the right order.

Researchers can compare the genome of someone with a rare disease to those of others without it, to find potentially noteworthy variants. Work will be divided into areas.

“There’ll be a cardiovascular domain, with UK leadership,” says Professor Caulfield. “Within each domain, a series of subsets will focus on specific diseases. Some researchers might work on Marfan syndrome, some on FH, some on familial hypertrophic cardiomyopathy, and other disorders as well.”

By giving experts access to the data store, they can compare characteristics across genomes, enabling them to say – with greater confidence – whether these are likely to be characteristics of a rare disease.

Lifelong screening

Researchers plan to follow project participants through the course of their lives. This will, says Professor Caulfield, give us a clearer picture of how rare disorders progress through middle and later life, providing further clues about treatment.

The programme was launched in 2012 and finished in late 2018, leaving a legacy of Genomic Medicine Centres and a state-of-the-art sequencing centre.

In addition, the project allocated £25m to provide 700 person-years of education in the form of short courses, PhDs and master’s degrees. These will “drive up the cadre of people able to use this technology in the healthcare system,” says Professor Caulfield. The first master’s courses began in 2015.

From the initial stages of the project, Professor Caulfield knew the impact could be far-reaching. “You might ask – why invest in these diseases if they’re rare?” says Professor Caulfield. “But each individual disease affects five in 10,000 people, and there are 7,000 of these rare diseases, so collectively they affect around three million people in the UK. This programme is designed to get all of them a genomic diagnosis for the first time.”

Sequencing Success 

The 100,000 Genomes Project has reached its goal of sequencing 100,000 whole genomes. The success of the 100,000 Genomes Project could not have been possible without the 85,000 participants, 1,500 NHS staff over 3,000 researchers that played a role in reaching the sequencing goal.

One of the project's participants even included the BHF's chief executive, Simon Gillespie, who took part in the project to better help explain a possible condition of his inherited, unexplainable high cholesterol.

This project has allowed the ability to provide one in four patients a life-changing diagnosis for the first time.  The UK has become the first nation to use whole genome sequencing in direct healthcare and began the foundations for a NHS Genomic Medicine Service which will allow easy public access to genomic testing starting in 2019. 

Professor Mark Caulfield

Professor Mark CaulfieldProfessor Caulfield is Chief Scientist at Genomics England, Professor of Clinical Pharmacology at Barts and The London School of Medicine and Dentistry, and Director of the William Harvey Research Institute. He specialises in cardiovascular genomics and has led a number of studies into hypertension.

More useful information