Other blogs I read

Dear friends,

In today’s post, I want to share with you some of the other blogs I read regularly, both mathematical and non-mathematical. I have six in each category, and they will eventually form part of my blogroll once I figure out how to create it. I hope you will enjoy them as much as I do. And although it currently seems to be inactive, I want to mention Steven Strogatz’ blog with the New York Times, which was a source of enormous inspiration for my own.

Mathematical blogs

1) Terence Tao’s blog contains lots of technical material from Terence’s work in various areas of math, but also advice to mathematicians and fairly accessible expositions of new mathematical developments.

2) Timothy Gowers’ blog discusses interesting mathematical topics, but also projects such as Polymath, a collaborative platform for mathematicians, and issues such as open-access mathematics journals.

3) Peter Cameron’s blog discusses interesting connections between combinatorics, algebra and discrete mathematics, and also touches on issues such as mathematics education and the running of universities.

4) Scott Aaronson’s blog focuses on complexity theory, the area of mathematics concerned with the amount of resources needed to solve various problems on a computer, as well as quantum computing.

5) Maria Monks’ blog collects various mathematical “gemstones”, which are either problems with nice solutions or useful techniques, and classifies them by difficulty level; it offers something to everyone.

6) Sam Bankman-Fried’s blog contains mainly game-theoretic analysis and simulations of sports events and elections, but also his reflections on rational philosophy and other not strictly mathematical topics.

Non-mathematical blogs

1) Hillary Rettig’s blog contains a wealth of valuable advice for dealing with productivity barriers and other work (and life) issues directed at activists, writers, academics and all people with ambitious goals.

2) Chris Dippel’s blog, Retronyma, analyzes the latest developments in the field of global health, more specifically the biotechnology and medical innovations being made available in low-income countries.

3) Pragya Bhagat’s blog, The Road Not Taken, documents her return to her native India and her work on understanding and addressing the sources of various socioeconomic problems in rural parts of India.

4) Assaf Urieli’s blog, Moyshele, describes his various projects, such as translations of songs between English, French and Yiddish, publication of a book of riddles, and the adoption of a child from Russia.

5) Brooke Shields’ blog, Veggie 365, lists her delicious vegan recipes for breakfast, lunch, dinner, and, of course, dessert. Every recipe is accompanied by a photograph and some are quite easy to make.

6) Jonathan Sharman’s blog, which he just started, describes his experience as an experimental and theoretical physicist (some of the projects will also appear on video) and his thoughts on science today.

What other blogs have you found particularly interesting? Please let me know in the comments!

Acting, academia and industry: an interview with Melodie Mouffe

Dear readers,

It is my pleasure to introduce to you today Mélodie Mouffe, a young mathematician from Belgium whose career path so far has started in academia and ended in industry, and is somewhat the opposite of my own. I hope to present to you the different challenges faced by mathematicians in the two environments by discussing with Mélodie her experience in both, to give you a better idea of the tradeoffs between the two.

Mélodie’s interests extend beyond mathematics research into teaching, as well as music and acting. We were able to touch on a number of interesting topics in our conversation, which I hope you will enjoy! Here it is.

Mathematics of Relationships

Dear readers,

Many people around the world today will be celebrating Valentine’s Day, a day that celebrates romantic relationships. It may be surprising to you that mathematics can say something helpful on that topic, a never-ending source of complexity that resists clean definitions. Nevertheless, mathematics can in fact provide useful insights in some specific contexts, which I’m going to talk about in today’s post.

Perhaps some of you are considering making a marriage proposal to your significant other, or will have to respond to one (for those of you who don’t like the idea of marriage, feel free to substitute civil union or whatever other form of commitment works for you, as long as it entails long-term exclusivity). In case you have to respond to such a proposal, what is the best way to decide whether to say yes or no? This is a difficult problem, involving a host of variables such as compatibility, emotional connection, similarity of values, and chemistry. One of the challenges of this situation is the fact that, assuming that you’ve been in several other relationships before, your current partner is unlikely to be superior to all your previous romantic partners on all these criteria. In fact, it’s easy to see that the more partners you’ve had, the less likely the current one is to simultaneously maximize all these criteria. But suppose for the moment that you’re able to compare any two of your partners. What’s the optimal decision then?

There is a simple mathematical answer to this question, originally called the fiancée problem, solved in 1958. It depends on a number of critical assumptions. First, you have to be able to give each partner a “suitability score”. Second, you have to assume that the partners present themselves in an order that’s random with respect to this score. Third, you have to respond immediately and can’t change your mind later. Lastly, you need to know how many partners you might have to choose from over your lifetime, which we will call n. Under those conditions, the optimal strategy for choosing the best possible partner turns out to be to say no to roughly the first n/e partners, then to say yes to the first one who happens to be better than every other one you’ve considered so far. Here, e = 2.71828… is Euler’s number, the base of the natural logarithm. This strategy gives you a roughly 1/e (or about 37%) chance of finding the best partner for large n. For small values of n, there is an explicit formula. For example, if you expect to have 8 to 10 partners, you should say no to the first 3 and then yes to the first one who is better than all the previous ones, which will give you an approximately 40% chance of saying yes to the best one.

Of course, finding the best possible partner is not a guarantee of the stability of the marriage, because it may happen that other options will present themselves to either yourself or your partner later on, and an incentive to deviate from exclusivity (or towards commitment with a different person) will appear. Can we guarantee that this doesn’t happen? Not an easy task, when both infidelity and divorce rates are so high. But part of the problem lies in the fact that we as a society have more options now than we ever had, and we can no longer exhaustively evaluate all of them ahead of time when making a decision. Suppose, however, that all of us do know all of our options ahead of time, and can rank them without any ties. Is there a way to guarantee stable marriages in this case? It turns out that there is, provided, of course, that there are as many men as women (I do realize that this is a heteronormative restriction, but it will make the discussion easier – in general, the important thing is that there are two distinct groups of people who can only form marriages with someone from the other group, not from the same group).

This problem, known as the stable marriage problem, was solved in the early 1960s by David Gale and Lloyd Shapley, the latter being a co-laureate of the latest Nobel Prize in Economics for his varied work. The solution is both natural and very elegant. Once again, we use n to denote the number of men (and women) who need to be married. Each woman has a ranking of the men, and each man, of the women. Initially, none of them is engaged. The solution proceeds by rounds. In each round, an unengaged man proposes to the highest-ranked woman on his list to whom he has not proposed yet. The woman always provisionally says “maybe” to him if she is not engaged or is engaged to someone she prefers less (in which case the man she is currently engaged to gets the initial “maybe” changed to a “no”), and “no” otherwise. At the end of this process, every man will be engaged to some woman, and marriages occur. These marriages will be stable in the sense that no man and woman will want to run away with each other from their spouses. Indeed, if a man, say Bob, likes a woman, say Alice, more than his wife, this means that he proposed to Alice before he proposed to her. But then, Alice must have said “no” to him at some point (otherwise they would be engaged at the end), so she prefers her assigned spouse to Bob.

An interesting side note is that the marriages that result from this process are optimal for the men (in the sense that no man can do better in a stable solution than the spouse he ends up with). This is due to the fact that a man is in principle able to propose to every woman on his list, while a woman may only get a limited number of proposals. From that point of view, it pays to be proactive in making proposals (though I’m not necessarily advocating for a reversal of the “traditional” model based on this analysis).

Many of you will be quick to object that the two problems I discussed so far – the fiancée problem and the stable marriage problem – are fairly unrealistic and do not reflect the way our society works. That’s a fair criticism, to which I can only respond that all mathematical models, especially those that have an elegant analysis or solution, involve simplifications of reality. But there is also a more serious approach to the mathematical modeling of relationship dynamics that appeared a few years ago. It is based on a system of differential equations, and, like any good model, it has parameters that can be estimated from experiments and makes falsifiable predictions. The book describing this approach has made it onto my reading list, right behind Nate Silver’s book that I mentioned in an earlier post, and I’ll report on it as soon as I get to read it. Meanwhile, I wish you all a happy Valentine’s Day, no matter your relationship status! And remember that mathematics is like love; a simple idea, but it can get complicated.

Chocolate, chance and choosing a problem

Dear readers,

So far in my blog I’ve described various areas of mathematics, discussed common stereotypes and misconceptions about mathematicians, and even interviewed several of them. However, I’ve never quite managed to give you a sense of what it is that professional mathematicians actually do. Today, I’m going to try to do just that, using a problem I came across recently as an example. I’ll describe my own process from beginning to end; keep in mind that this process may be quite different for other mathematicians.

Let’s start with the problem choice. I was at a workshop on entrepreneurship where the workshop leader made us play a simple game. We were given a bag with 10 light and 10 dark chocolates, and had to call the type of chocolate we were going to draw next. We always got to keep the chocolate that we drew, and if our guess was correct, we also got to continue playing; otherwise, the game ended there. This game was used as an example of a fully predictable situation. The best strategy also seemed pretty clear: guess at random if there are as many light chocolates as dark ones, and guess the more abundant kind if there is an unequal number to stay in the game longer. When I saw this game, though, a new question immediately came to my mind: what was the expected value of this game? In other words, if I value a chocolate at a dollar, how much should I be willing to pay to play the game?

I took out a pencil and a sheet of paper and within a few minutes, I was able to calculate the value of the game, which was slightly above 2 dollars. This calculation was pretty simple; it used a recursion relating the value of the game to the values with either one less dark or one less light chocolate (which were equal since all the chocolates had equal value), and relating that value to the value of the game with one less dark chocolate and one less light one. While I was glad to be able to solve this particular question, another one immediately came to mind. What if I valued dark chocolates more highly than light chocolates? In that case, my recursion showed something slightly unexpected: the optimal strategy is still to guess the more abundant type of chocolate, but if there are equally many of each kind, it’s better to guess light even though I prefer dark. This becomes obvious when there is only one of each kind, but less obvious when larger numbers are involved.

At this point, the game had piqued my interest enough for me to try to find a general formula for the value of the game for different relative preferences over light and dark chocolates. At first I computed some values by hand, but couldn’t see a pattern. I sent the problem with my preliminary results to a few of my fellow mathematicians, but none of them saw a pattern either. Then I decided to ask MAPLE, the symbolic algebra software, for help. It didn’t find a simple pattern, but by working with it, I eventually saw a pattern myself, which it helped me confirm. I then checked by hand that the formula I had found satisfied the recursion. At that point I was ready to write down a proof of my formula, involving recursion and some case analysis.

Interestingly, the formula involved a famous sequence of integers known as the Catalan numbers; they count various mathematical objects, such as tied two-candidate elections where one candidate is never behind the other in the partial counts, the ways a polygon can be divided into triangles by joining vertices, and the ways an expression involving only one operation such as addition or multiplication can be bracketed. I was intrigued by that and had a hunch that something interesting would happen in a more general scenario, say with three kinds of chocolates involved. This turned out to be the case.

There were actually two ways of generalizing it, of which I picked the more natural one (where you had to guess the right kind among the three to stay in the game). Strategy-wise, the same result turned out to be true; namely, you should always guess the most abundant chocolate, breaking ties in favor of the one you like the least. The new formula, discovered with the help of MAPLE and a lot of trial and error, involved a new sequence of numbers that did not seem to be known, which I tentatively called the poly-Catalan numbers. Proving the formula turned out to be rather challenging, not only because of a much larger number of cases to analyze, but also because of the complexity of the formula itself. However, I finally managed.

I believe my work on this problem illustrates several common approaches to mathematics. First, I found an interesting situation which I could turn into an equally interesting mathematical problem. Second, I spent time figuring out a solution to this problem, unsuccessfully at first, and successfully later when I got a computer involved. Third, I found an interesting feature of the problem’s solution that made me interested in making it more general. Fourth, I developed the tools needed to solve the more general version, and discovered a previously undiscovered sequence of integers. Finally, I wrote up my solutions and I’m currently putting the final touches on it so I can try to get it published in a journal. In the process, I also raised a number of new questions that I hope will be answered by other mathematicians someday.

As for whether this kind of work is actually useful, I don’t know for sure. It was certainly useful for me on many levels: it taught me to be more patient, made me learn some new tools, and involved the help of a computer. The new sequence of numbers I discovered in the process may well turn out to be important in a completely different area of mathematics or the sciences; however, even if it doesn’t, I really enjoyed this work, and ultimately, that’s the most important justification for doing it.

The Missing Equation

Dear readers,

For many people, the word “mathematics” immediately evokes equations. Equations certainly stand for something mathematics does very well – take several quantities in symbolic form and join them together into a single, concise yet powerful statement. Equations can provide tremendous new insights into the way the quantities relate to each other, and revolutionize subsequent developments in science.

This is precisely the argument made by the fantastic mathematician and writer Ian Stewart in a recenty published book. The book describes the meaning 17 equations that literally changed the world. Six of them are simply definitions – logarithms, derivatives, imaginary numbers, the normal distribution, the Fourier transform, and Shannon’s information entropy, all fall into this category. Three others uncover relationships between mathematical quantities – the Pythagorean theorem, Euler’s formula, and the logistic growth model. Seven others are fundamental physical equations – Newton’s law of gravitation, the wave equation, the Navier-Stokes equation, Maxwell’s equations, Einstein’s famous E = mc2, and the Schrödinger equation, as well as the second law of thermodynamics, which is actually an inequality. The last equation is the Black-Scholes equation from economics, whose incorrect application arguably contributed to the financial crisis of the recent years; I’m looking forward to reading more on this soon.

While this selection of influential equations is necessarily limited, I was somewhat disappointed not to see any equations from my field, mathematical biology. It is true that there aren’t as many equations in mathematical biology as there are in physics, and their role is somewhat more limited. But there is one equation which I believe deserves a mention, perhaps in the next edition of Ian Stewart’s book: that is the defining equation of the Hardy-Weinberg principle: p2 + 2pq + q2 = 1. Incidentally, the first part of the principle’s name comes from the same Hardy who I quoted in an earlier post stating his pride in the fact that none of his work in mathematics would ever have any applications. After explaining what the equation means, I’ll show how it applies to blood types, forensics, and the search for genetic markers of diseases, in spite of Hardy’s claim. Weinberg was a German scientist who independently discovered this principle at the same time as Hardy, as seems to so frequently happen with good ideas.

Suppose that we have a single human genetic locus (position), which can have either one of two alleles (forms). These two alleles are commonly denoted by A and a; for convenience, A is called dominant and a, recessive. Since every person inherits one copy of the gene from their father and one from their mother, their genotype (underlying genetic makeup) can be either AA (homozygous dominant), Aa (heterozygous), or aa (homozygous recessive). The Hardy-Weinberg principle tells us what happens if the allele frequencies do not differ between males and females, and the mating happens randomly (to be contrasted with assortative mating, where individuals prefer to mate with those whose genes are similar to their own). In this case, if p is the frequency of allele A and q = 1 – p, the frequency of allele a, the frequency of AA is p2, that of Aa is 2pq (the factor of 2 is needed because we ignore the order, ie. Aa and aA are the same), and that of aa is q2. The equation tells us that these frequencies add up to 1; the principle itself tells us that these frequencies do not vary from one generation to the next under the conditions of random mating, provided there is no mutation, migration, or natural selection happening.

This idea can be generalized to more than two alleles. For instance, the blood type can be described as a trait with three alleles, A, B and O, with O being recessive. This means that there are four possible blood phenotypes: A (which could come from genotypes AA or AO), B (which could come from BB or BO), AB or O (which could only come from OO because O is recessive). Knowing the frequencies of the A, B and O alleles (say, p, q and r) would allow us to calculate the frequencies of each blood type: p2 + 2pr, q2 + 2qr, 2pq and r2, respectively (you can easily check that they add up to 1 if p + q + r = 1). In the same way, knowing the frequencies of each blood type can give us the frequencies of each allele. We can also test for significant deviations from the equilibrium frequencies in a population to see if the mating between people of different blood types happens randomly. For example, if we look at Canada, where the fractions are 46%, 42%, 9% and 3% for each of the 4 blood types, we get estimates of p, q and r that are not consistent with the Hardy-Weinberg law, and conclude that the mating is not random.

When forensics has to answer the frequently asked question of how likely a person is to have the same genetic variants as the person who committed a certain crime, the Hardy-Weinberg principle plays an important role. It is typically used to justify the assumption that the probability of a person having a given set of genetic variants is the product of the individual probabilities (frequencies) of the variants. However, in some specific cases, this assumption may not be justified because of deviations from the equation we have been discussing, and this may (at least in principle) lead to some false convictions.

One final application of the Hardy-Weinberg equation comes from the search for the genetic causes of complex diseases. In this situation, we typically look at a large number of people who have the disease (called cases) and a large number of people who don’t have it (called controls). A genetic variant that deviates from the Hardy-Weinberg equation in the cases, but not in the controls, is very likely to be associated with the disease. Of course, after some spurious discoveries in the early days after the Human Genome Project was completed, the association is now considered to be confirmed only when it has been found in a group of cases distinct from the initial group (called a replication cohort). As for the variants that deviate from the Hardy-Weinberg equation in the controls, they are usually discarded from the analysis as unreliable. In this way the equation is an important tool in the analysis of GWAS, or genome-wide association studies, the tool of choice for the discovery of genetic disease markers.

I hope I convinced you that the Hardy-Weinberg equation deserves to be included among the most influential equations of our times. What other equations would you suggest adding to Ian Stewart’s list?