A majority of the DNA that has been sequenced for research comes from donors of European ancestry. That causes a knowledge gap about the genome of people from the rest of the world.
Among various things that unite humans around the world, the DNA sequence hovers at the top: a whopping 99.9% of human DNA sequences are identical among people.
Gregor Mendel, a monk and scientist whose 200th birthday is this Wednesday (July 20), proposed that certain “invisible factors” were responsible for the various characteristics we display. Today, we know that these factors are genes, which make up our DNA, or deoxyribonucleic acid.
This acid molecule gives genetic instructions to living beings. If humans share so much of the same DNA, why is diversity important in the context of DNA sequencing?
To understand that, we have to shift our focus to the 0.1% of the difference in the human DNA sequences. The seemingly small difference stems from variations among the nearly 3 billion bases (or nitrogen-based compounds) in our DNA.
All the dissimilarities we know between different humans including hair or eye color or the height of a person, are due to these variations.
However, over the years scientists found that these variations could also give us vital information on a person’s or a population’s risk for developing a specific disease.
We can then use the risk assessment from the genetic data to design a health-care strategy that is tailored to the individual.
Genetics and disease risk assessment
Many of us have had the experience of filling out forms at the doctor’s office that ask us about the different diseases our parents or relatives suffered. You are warned to stay away from sweets and processed sugars if a parent was diabetic, for example.
While transfer of heart diseases, cancer or diabetes between one generation to another is known more commonly, there are many more diseases that can be inherited genetically.
For example, we know that sickle cell anemia occurs when a person inherits two abnormal copies of the gene that makes hemoglobin, a protein in our red blood cells, one from each parent.
In recent decades, genetic research has advanced to the point that scientists can isolate the genes responsible for many of these diseases.
Here’s the catch: We know this correlation between genes and diseases for a very restricted population.
Euro-centric data
Sarah Tishkoff, a geneticist and evolutionary biologist at the University of Pennsylvania in the US, is one of many in the scientific community pushing for more diverse genomic datasets.
“Let’s say that a study focused on people with European ancestry identifies genetic variants associated with risk for heart disease or diabetes, and uses that information to predict risk for disease in patients not included in the original study,” said Tishkoff.
“We know from experience that this prediction of disease risk doesn’t work well when applied to individuals with different ancestries, particularly if they have African ancestry.”
Historically, the people who have provided their DNA for genomics research have been overwhelmingly of European ancestry, “which creates gaps in knowledge about the genomes from people in the rest of the world,” according to the National Human Genome Research Institute (NHGRI) in the US.
The institute states that 87% of all the genome data we have is from individuals of European ancestry, followed by 10% of Asian and 2% of African ancestry.
As a result, the potential benefits of genetic research, which includes understanding early diagnoses and treatment of various diseases, may not benefit the underrepresented populations.
Lack of equitability in treatment
The problem does not stop with disease risk assessment. It permeates the space of equitable health care as well, says Jan Witkowski, a professor from the Graduate School of Biological Sciences at the Cold Spring Harbor Laboratory in the US state of New York.
“Say you have two groups: group A and group B, who are very different. The knowledge and information you learn about people in group A may not apply to people in group B. But imagine developing medical treatments based on information from just group A for everyone,” he said, adding, “it is not going to work on group B.”
By including diverse populations in genomic studies, researchers can identify genomic variants associated with various health outcomes at both the individual and population levels.
The NHGRI also states, however, that diversifying the participants in genomics research is an expensive affair and requires the establishment of trust and respectful long-term relationships between communities and researchers.