sábado, 5 de octubre de 2013

The Mermaid's Tale: Incidentally,...... (Interpreting incidental findings from DNA sequence data)

The Mermaid's Tale: Incidentally,...... (Interpreting incidental findings from DNA sequence data)

Incidentally,...... (Interpreting incidental findings from DNA sequence data)

Genome sequencing yields masses of data.  It's one of the founding justifications of the current cachet term Big Data.  The jury is still out on how much of it is meaningful in any sort of clinical way as opposed to other sorts of data; that's fine, it's early days yet.  But too much of current thinking seems to rest on ideas that are outdated in fundamental ways.  It would be helpful to get beyond this.

A new paper in The American Journal of Human Genetics, ("Actionable, Pathogenic Incidental Findings in 1,000 Participants’ Exomes", Dorschner et al.) reports on a study of gene variants in 1000 genomes of participants in the National Heart, Lung, and Blood Institute Exome Sequencing Project.  How many variants associated with genetic conditions that might be undiagnosed does each individual carry?  This is addressing the issue of how much individuals should be told about "incidental findings" in their genome or exome sequences.

The investigators looked at single nucleotide variants in 114 genes in 500 European Americans and 500 African American genomes.  They found 585 instances of 239 unique variants identified by the Human Gene Mutation Database as disease-causing.  Of these, 16 autosomal-dominant variants in 17 people were thought to be potentially pathogenic; one individual had 2 variants.  A smattering of other variants not listed in HGMD were found as well.  The paper reports a frequency of ~3.4% and ~1.2% of pathogenic variants in individuals of European and African descent, respectively.

The 114 genes were chosen by a panel of experts, and pathogenicity determined by the same. 
“Actionable” genes in adults were defined as having deleterious mutation(s) whose penetrance would result in specific, defined medical recommendation(s) both supported by evidence and, when implemented, expected to improve an outcome(s) in terms of mortality or the avoidance of significant morbidity.
Variants were classified as pathogenic, likely pathogenic VUS (variant of uncertain significance), VUS and likely benign VUS.  Classification criteria included allele frequency of the variant (if low in the healthy population, it was considered to be more likely pathogenic than if high, relative to disease frequency), segregation evidence, number of reports of affected individuals with the variant, and whether the mutation has been reported as a new mutation or not. The group decided not to return VUS incidental findings to the individual if the variant was in a gene unrelated to the reason they were included in the study in the first place.

Reviewers of the data followed stringent criteria to classify alleles.  For example, variants were considered suspect if they were identified by the HGMD as disease-causing.  But they were not considered disease-causing if the allele frequency was common enough that this meant, relative to disease frequency, that the allele alone couldn't be causal.

This raises a point that we've made before, too often to a deaf audience; the variant is not 'dominant' and the 150 year old term due to Mendel should be dropped from usage, in favor of a more accurate conception of probabilistic causation (see below)If the allele was more common than the disease, other alleles or factors must also be involved.  Or, perhaps the allele was improperly classified as causal in the first place.
Punnett square showing results of crossing yellow and green peas; Wikipedia

And,"maximum allowable allele frequencies for each disease were calculated under a very conservative model, including the assumption that the given disorder was wholly due to that variant."  Disease frequencies were overestimated when they weren't known. But, is dominance a function of disease frequency?  Suggesting such a thing should raise big red flags about semantics and the conceptual working frameworks being used.
The eight participants with confirmed pathogenic (versus likely pathogenic) mutations included three with increased risk of breast and ovarian cancer (MIM 604370, caused by BRCA1 mutations, or MIM 612555, caused by BRCA2 mutations), one with a mutation in LDLR, associated with familial hypercholesterolemia (MIM 614337), one with a mutation in PMS2, associated with Lynch syndrome (MIM 614337), and two with mutations in MYBPC3, associated with hypertrophic cardiomyopathy (MIM 115197), as well as one person with two SERPINA1 mutations, associated with the autosomal-recessive disorder alpha-1-antitrypsin deficiency (MIM 613490).
Fewer actionable alleles were found in African Americans than European Americans, presumably because fewer studies have been done in this population and fewer causal alleles identified.  Dorschner et al. did not have access to phenotypes of the people included in this study, so can't know their health status with respect to these variants.  Nor, of course, can they know whether individuals might have a condition for which a causal allele was not found. 

They report few pathogenic alleles, though the fact that they looked only for alleles associated with adult onset conditions could partially explain this.  And, their criteria were stringent.  And, of course, they were looking only for single gene disorders, so it's not a surprise that they identified so few potentially pathogenic alleles, in fact. Of course, single gene disorders are only a small subset of conditions that might affect us.

A 2011 Cell paper we've mentioned before ("Exome sequencing of ion channel genes reveals complex variant profiles confounding personal risk assessment in epilepsy," Klassen et al.) looks at this question from a different angle.  Klassen et al. compared the exomes of 237 ion channel genes (known to be associated with epilepsy) in affected and unaffected people.  They found rare variants in Mendelian disease genes at equivalent prevalence in both groups.  That is, healthy people were as likely to have purportedly causal variants as those with sporadic, idiopathic epilepsy.  They caution that finding a variant is only a first step.

The unjustified dominance of 'dominance'
We feel compelled to comment again, and further, on the terminological gestalt involved in papers such as these, as we commented yesterday (and have done on earlier posts).

The concept of dominant single-locus causation goes back to Mendel, who carefully chose traits in peas that worked that way.  He knew other traits didn't.  He learned that not all 'dominant' traits showed 'Mendelian' inheritance and even came to doubt his own theory later in life.

There are traits that seem to 'segregate' in families in classical Mendelian fashion.  There is, for such traits, a qualitative (e.g., yes/no) relationship between genotype and phenotype (trait).  When a dominant allele is present, you always get the trait.  This means we see (in the case of dominance) about 1/2 of offspring of an affected parent who are affected.   Much of the 20th century in human genetics was spent trying to fit patterns of inheritance to such single-gene models.  But it was very clear that dominance (even when there was evidence for it) wasn't dominance!  The traits were not always black and white (or should we say green and yellow?).  And the segregation proportion wasn't 50%.  What to do?

The conviction that the trait was due to the effects of a single locus seemed to have strong support, so the concept of 'penetrance' was introduced.  This is not the first time that a fudge factor has been used to force a model to fit data when it didn't really.  The idea in this case is that the inheritance of the allele (variant) from the parent had a 50% probability, but that the allele, once present, did not always cause the trait.  If you can add a factor of 'incomplete penetrance', that can vary from 0 to 1, then you can fit a whole lot of data that otherwise wouldn't support Mendelian causation.

What we now know is that there are many variants at genes associated with single-gene traits, that other genes almost always also contribute (along with environmental factors as well, most of the time), and that the trait itself is quantitative: the same 'A' dominant allele doesn't always cause the same degree of severity and so on.  In other words, the trait is mainly (in a statistical sense) due to the presence of variation in a single gene, but the effect depends on the specific allele that is involved in a given case and also is affected by the rest of the genome.

In other words, there is a quantitative relationship between genotype and phenotype.  This is a general, accurate description of the pattern of causation.  The pattern is not 'Mendelian' and the causation is not dominant.  Or, to be clearer, the extreme cases are close to, or even exactly, single-allele dominant, but this is the exception that proves (tests) the quantitative-relationship rule.

We should stop being so misled by constraining legacy terminology. Mendel did great work.  But we don't have to keep working with obsolete ideas.

No hay comentarios:

Publicar un comentario