The Missing Equation

Dear readers,

For many people, the word “mathematics” immediately evokes equations. Equations certainly stand for something mathematics does very well – take several quantities in symbolic form and join them together into a single, concise yet powerful statement. Equations can provide tremendous new insights into the way the quantities relate to each other, and revolutionize subsequent developments in science.

This is precisely the argument made by the fantastic mathematician and writer Ian Stewart in a recenty published book. The book describes the meaning 17 equations that literally changed the world. Six of them are simply definitions – logarithms, derivatives, imaginary numbers, the normal distribution, the Fourier transform, and Shannon’s information entropy, all fall into this category. Three others uncover relationships between mathematical quantities – the Pythagorean theorem, Euler’s formula, and the logistic growth model. Seven others are fundamental physical equations – Newton’s law of gravitation, the wave equation, the Navier-Stokes equation, Maxwell’s equations, Einstein’s famous E = mc2, and the Schrödinger equation, as well as the second law of thermodynamics, which is actually an inequality. The last equation is the Black-Scholes equation from economics, whose incorrect application arguably contributed to the financial crisis of the recent years; I’m looking forward to reading more on this soon.

While this selection of influential equations is necessarily limited, I was somewhat disappointed not to see any equations from my field, mathematical biology. It is true that there aren’t as many equations in mathematical biology as there are in physics, and their role is somewhat more limited. But there is one equation which I believe deserves a mention, perhaps in the next edition of Ian Stewart’s book: that is the defining equation of the Hardy-Weinberg principle: p2 + 2pq + q2 = 1. Incidentally, the first part of the principle’s name comes from the same Hardy who I quoted in an earlier post stating his pride in the fact that none of his work in mathematics would ever have any applications. After explaining what the equation means, I’ll show how it applies to blood types, forensics, and the search for genetic markers of diseases, in spite of Hardy’s claim. Weinberg was a German scientist who independently discovered this principle at the same time as Hardy, as seems to so frequently happen with good ideas.

Suppose that we have a single human genetic locus (position), which can have either one of two alleles (forms). These two alleles are commonly denoted by A and a; for convenience, A is called dominant and a, recessive. Since every person inherits one copy of the gene from their father and one from their mother, their genotype (underlying genetic makeup) can be either AA (homozygous dominant), Aa (heterozygous), or aa (homozygous recessive). The Hardy-Weinberg principle tells us what happens if the allele frequencies do not differ between males and females, and the mating happens randomly (to be contrasted with assortative mating, where individuals prefer to mate with those whose genes are similar to their own). In this case, if p is the frequency of allele A and q = 1 – p, the frequency of allele a, the frequency of AA is p2, that of Aa is 2pq (the factor of 2 is needed because we ignore the order, ie. Aa and aA are the same), and that of aa is q2. The equation tells us that these frequencies add up to 1; the principle itself tells us that these frequencies do not vary from one generation to the next under the conditions of random mating, provided there is no mutation, migration, or natural selection happening.

This idea can be generalized to more than two alleles. For instance, the blood type can be described as a trait with three alleles, A, B and O, with O being recessive. This means that there are four possible blood phenotypes: A (which could come from genotypes AA or AO), B (which could come from BB or BO), AB or O (which could only come from OO because O is recessive). Knowing the frequencies of the A, B and O alleles (say, p, q and r) would allow us to calculate the frequencies of each blood type: p2 + 2pr, q2 + 2qr, 2pq and r2, respectively (you can easily check that they add up to 1 if p + q + r = 1). In the same way, knowing the frequencies of each blood type can give us the frequencies of each allele. We can also test for significant deviations from the equilibrium frequencies in a population to see if the mating between people of different blood types happens randomly. For example, if we look at Canada, where the fractions are 46%, 42%, 9% and 3% for each of the 4 blood types, we get estimates of p, q and r that are not consistent with the Hardy-Weinberg law, and conclude that the mating is not random.

When forensics has to answer the frequently asked question of how likely a person is to have the same genetic variants as the person who committed a certain crime, the Hardy-Weinberg principle plays an important role. It is typically used to justify the assumption that the probability of a person having a given set of genetic variants is the product of the individual probabilities (frequencies) of the variants. However, in some specific cases, this assumption may not be justified because of deviations from the equation we have been discussing, and this may (at least in principle) lead to some false convictions.

One final application of the Hardy-Weinberg equation comes from the search for the genetic causes of complex diseases. In this situation, we typically look at a large number of people who have the disease (called cases) and a large number of people who don’t have it (called controls). A genetic variant that deviates from the Hardy-Weinberg equation in the cases, but not in the controls, is very likely to be associated with the disease. Of course, after some spurious discoveries in the early days after the Human Genome Project was completed, the association is now considered to be confirmed only when it has been found in a group of cases distinct from the initial group (called a replication cohort). As for the variants that deviate from the Hardy-Weinberg equation in the controls, they are usually discarded from the analysis as unreliable. In this way the equation is an important tool in the analysis of GWAS, or genome-wide association studies, the tool of choice for the discovery of genetic disease markers.

I hope I convinced you that the Hardy-Weinberg equation deserves to be included among the most influential equations of our times. What other equations would you suggest adding to Ian Stewart’s list?

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>