Sequencing 100,000 human genomes

100,000 human genomes

A world-leading project to sequence 100,000 human genomes could get thousands of families the diagnosis they need. Genomics England’s Chief Scientist tells Sarah Kidner more.

“The human genome is the blueprint for the people that we are,” says Professor Mark Caulfield, Chief Scientist for Genomics England.

More importantly, our genomes contain vital information about our susceptibility to certain conditions, including cardiovascular disease, cancer and rare diseases.

“The genome contains about 3.3 billion ‘letters’, and in every 300 or so there’s a change to one that can make us more susceptible to a disease or, if passed on from generation to generation, can cause us to inherit specific diseases,” explains the professor. “Sometimes there will be extra bits, or bits missing, and periodically bits will be moved around or not in the right place.”

Professor Caulfield is advising on an ambitious programme that will sequence 100,000 genomes by 2017. The aim is to identify the specific changes responsible for triggering both rare diseases and more common ones, such as cancer and heart disease.

Our genomes contain vital information about our susceptibility to certain conditions, such as cancer and heart disease

This information could allow us to tell thousands of families why their loved ones are susceptible to as-yet-undiagnosed conditions. “It’s about getting information to patients, mums and dads, and families who can’t get it right now from currently available technologies in the health service,” says Professor Caulfield.

The UK is uniquely placed to lead on a project of this scale. “There is an opportunity for Britain, with a unified healthcare system, free at the point of delivery, to transform the application of genomics in medicine,” says Professor Caulfield.

“We have reached a point where developments in technology and chemistry allow us to sequence an entire human genome in a few days for the relatively low cost of circa £1,000. The Human Genome Project revealed there were only about 20,000 genes that code for the proteins that make us who we are – about the same as a starfish. The role of the remainder, around 95 per cent of it, was a mystery.

“We now know that the remaining DNA plays a critical role in determining how and when these proteins are produced, which is why it’s important to sequence the entire genome.”

Rare discoveries

Sequencing the whole genome of people with a rare disease could help us identify atypical variants in the genetic code that actually cause diseases. But these rare variants are hard to find.

“By a rare variant, we mean a genetic variant in your genome that occurs in less than one per cent of the population, and sometimes even more rarely than that,” says Professor Caulfield. “Each of us has many, many rare variants. Getting a clear line of sight on the ones that cause rare inherited heart conditions, for example, can be very challenging, so we need some help.”

100,000 human genomes That help will come from families who have a rare and undiagnosed disease, ideally two parents with an affected child, allowing us to track the rare variant through generations. Participants will come via the NHS and enrol in the programme through NHS Genomic Medicine Centres.

The first wave of these was announced in December 2014. “These centres will offer whole genome sequence to patients with rare inherited diseases who have not obtained genetic diagnoses from existing tests in the NHS. They will have time to think about it before they enrol, with informed consent,” says Professor Caulfield.

Typically, the definition of a rare, inherited disorder is one that affects five people per 10,000. For the 100,000 Genomes Project, the definition will be broader. “I decided when we started that we wouldn’t confine rare diseases to that definition, because it wouldn’t include less rare but very important disorders such as familial hypercholesterolaemia (FH). This affects about one in 200 and is a major cause of premature heart attacks,” says Professor Caulfield.

The project will also observe disorders that affect the heart muscle, including cardiomyopathies and rare disorders of heart rhythm. Professor Caulfield explains: “The mums and dads that are enrolling their children in this programme know that only by understanding the genetic basis of rare diseases do we have much hope of designing better treatments for them.”

Facts and figures

100,000 whole genomes from NHS patients will be sequenced by 2017

4 base components make up our DNA

220GB the amount of data occupied by a single genome

3.3 billion letters in a single human genome

5,000 to 8,000 the estimated number of rare disease

1 in 500 people are affected by familial hypercholesterolaemia

100,000 sequences

Some parts of the genetic code can be difficult to read, so they’ll read each one 30 times. “Reading the genome once is not enough, because you might miss bits,” says Professor Caulfield. “It’s like reading a book periodically; you get to a sentence and think, ‘How did I get there?’ You have to go back and read a bit again because what you’re reading now doesn’t make sense. You realise you have missed a bit. It’s the same reading a gene sequence.”

Once read, the genomes will be added to a databank. Each genome generates around 220GB of data (the storage capacity of 14 average iPhones), so a special data centre is being built to store them, using a £24m investment from the Medical Research Council. The data, Professor Caulfield explains, will be in two parts.

“One will retain an identity, because it’s important we can feed back to patients. A second data centre will store data in a non-identifiable format. We often call this anonymised or pseudonymised data. That data store will be 30 petabytes in size and will have about 30,000 dual processors.”

Only by understanding the genetic basis of rare diseases do we have much hope of designing better treatments

Scientists and clinicians will be able to access the raw data, or the genome sequence, as it comes out of the machine that captured it. The newly captured genomes will be aligned to a control genome, ensuring the genome is reassembled in the right order.

Researchers can compare the genome of someone with a rare disease to those of others without it, to find potentially noteworthy variants. Work will be divided into areas.

“There’ll be a cardiovascular domain, with UK leadership,” says Professor Caulfield. “Within each domain, a series of subsets will focus on specific diseases. Some researchers might work on Marfan syndrome, some on FH, some on familial hypertrophic cardiomyopathy, and other disorders as well.”

By giving experts access to the data store, they can compare characteristics across genomes, enabling them to say – with greater confidence – whether these are likely to be characteristics of a rare disease.

Lifelong screening

Researchers plan to follow project participants through the course of their lives. This will, says Professor Caulfield, give us a clearer picture of how rare disorders progress through middle and later life, providing further clues about treatment.

Already, a rare disease pilot has screened 2,000 people with rare diseases. This has covered 85 diseases, of which many relate to the cardiovascular system. Professor Caulfield believes they’re on track to sequence an initial 10,000 genomes. The remaining 90,000 will be sequenced by 2017, leaving a legacy of Genomic Medicine Centres and a state-of-the-art sequencing centre.

In addition, the project has allocated £25m to provide 700 person-years of education in the form of short courses, PhDs and master’s degrees. These will “drive up the cadre of people able to use this technology in the healthcare system,” says Professor Caulfield. The first master’s courses will begin in 2015.

While it’s early days, hopes are high that the project can deliver much-needed answers. The impact could be far-reaching. “You might ask – why invest in these diseases if they’re rare?” says Professor Caulfield. “But each individual disease affects five in 10,000 people, and there are 7,000 of these rare diseases, so collectively they affect around three million people in the UK. This programme is designed to get all of them a genomic diagnosis for the first time.”

Professor Mark Caulfield

Professor Mark CaulfieldProfessor Caulfield is Chief Scientist at Genomics England, Professor of Clinical Pharmacology at Barts and The London School of Medicine and Dentistry, and Director of the William Harvey Research Institute. He specialises in cardiovascular genomics and has led a number of studies into hypertension.

Related publications

More useful information