## Ages of Three Daughters

My uncle’s in town and just reminded me of a puzzle that he’d asked me over ten years ago. A census worker goes to a house and the person who answers the door tells the census worker that she has three daughters.

“I need to know their ages,” replies the census worker.

“Well, the product of their ages is my age: 36.”

“That’s not enough information.”

“Well, the sum of their ages is the same as the number as the house on the right.”

The census worker checks the house next door and responds, “That’s still not enough information.”

“My eldest daughter is asleep upstairs.”

The census worker thanks the homeowner and leaves. How old are the three daughters?

I love this puzzle because on the face of it, it would appear that there isn’t enough information to figure out the puzzle. If you want to figure out the puzzle yourself, do not read beyond this point.

The first thing we learn about the ages of the daughters is that the product of the ages is 36.

Let’s consider all such possibilities (let’s assume the puzzle maker only allows for integer ages):

1,1,36
1,2,18
1,3,12
1,6,6
1,4,9
2,2,9
2,3,6
3,3,4

As the census worker notes, there are still too many to disambiguate the ages.

The next clue is that the sum of the ages is the same as the number of the house next door. We don’t know the sum, but the census worker does, and we also know that the house number is insufficient to disambiguate the ages. Let’s look at the sums of the factorizations above:

1+1+36 = 38
1+2+18 = 21
1+3+12 = 16
1+6+6 = 13
1+4+9 = 14
2+2+9 =13
2+3+6 = 11
3+3+4 = 10

Note that the only sum that is not unique above is 13, so that is the only possibility in which the second clue would not have been enough for the census worker to disambiguate the ages. This leaves two possibilities:

1,6,6
2,2,9

The final clue helps with that one. If there’s an eldest daughter (let’s assume the puzzle maker created a world in which twins are considered to be the same age and that no two daughters who aren’t twins were born within twelve months of each other), then the only viable solution is 2,2,9.

Posted in Information Theory, Puzzle | 1 Comment

## Le Poisson

It was the spring of 2003; Toby Berger and I were both in the common room of Phillips Hall, and there was half an hour before his information theory class was set to start. Toby was advising my honors thesis, a project that was exploring mathematical models of how neurons transmit information. I approached Toby to ask about one of the assumptions he had made in his model, and a few minutes later, he took out a sheet of paper and started deriving an analogue of the central limit theorem for Poisson processes. I tried keeping up as he wrote down the steps, justified assumptions where he needed to, and eventually yielded a beautiful result. I left that discussion wanting the power to do what he had just done.

What Toby had done was derive a model for the probability distribution of the inter-arrival times between the spikes of a neuron, and he was explaining the rationale for one part of this model. Specifically, he was trying to justify to me how even though the inter-arrival times of individual neural spikes do not behave like a Poisson process (due to a refractory period), we could look at the afferent spike arrivals into a neuron as Poisson.

I referenced Toby’s derivation in some unpublished work during grad school, but my thoughts never returned to it until a holiday party this past week. During the party, someone posed a question that made me think about the frequency with which a person does an activity, and my mind settled upon Toby’s Poisson limit derivation. Like the spikes of a neuron, an activity, e.g. brushing one’s teeth, can be modeled as a renewal process. We were specifically talking about the distribution of inter-arrival times and how the average per person behavior over a population might compare against an individual’s activity.

I was positing cases in which the distribution of an individual’s activity might be bimodal. For instance, some people might brush their teeth twice a day while others might only brush once a day. The limiting argument then immediately started nagging at me because the inter-arrival distribution of a Poisson process isn’t bimodal, and I wanted to understand how the limiting argument would collapse the distribution into a single mode.

## Toby’s Derivation (Asymptotic)

Toby’s argument is a good starting point to think about this problem. Let’s consider $m$ renewal processes. Each process could represent the spikes of a single neuron or the times at which a specific person brushes his or her teeth. Given some fixed time $t$, we can model the time to the next arrival as non-negative random variables

$X_1, \ldots, X_m \sim F_X,$

where $F_X$ is a cumulative distribution function that we will assume is continuously differentiable over $[0, \infty)$ and where the probability density function $f_X(x)$ has the property that $f_X(0) > 0$. I am going to wave my hands that this last assumption isn’t so severe if we assume that the processes are out of phase with one another with respect to independent random phases and that is an assumption one would want to make, anyway.

Let’s superimpose all of the events of these $m$ renewal processes on a single timeline. I will now derive the result as $m \to \infty$, the superimposed process will look Poisson. By definition, this means that the inter-arrival times will look exponentially distributed. To see this, note that we can write the distribution of the time until the next arrival from our starting point $t$ as

$Z_m = m \cdot \min\{X_1, \ldots, X_m \}~.$

The multiplicative scaling factor $m$ in $Z_m$ above can be thought of as a per process averaging term since as we superimpose more processes, the points become closer together.

The goal is to argue that $Z_m$ is exponentially distributed. To do this, we can decompose the probability as follows:

1. By definition, $\mathbb{P}(Z_m \geq z) =\mathbb{P}(X_1 \geq z/m, \ldots, X_m \geq z/m )$
2. By independence, $\mathbb{P}(Z_m \geq z) = \prod_{i=1}^m \mathbb{P}(X_i \geq z/m)$
3. By identical distributedness, $\mathbb{P}(Z_m \geq z) = (1 - F_X(z/m))^m$
4. Taking the limit as $k \to \infty$ via L’Hospital’s rule, we have that $\lim_{k \to \infty} \mathbb{P}(Z_m \geq z) = e^{-z f_X(0)}$
5. This limit is exponentially distributed, indicating that our inter-arrival times are exponentially distributed, so by definition the limiting arrival process is Poisson.

## Finite Derivation

Let’s return to step 3 in Toby’s derivation, which is just before one takes the limit. We can rewrite this as

$\mathbb{P}(Z_m \geq z) = e^{m \ln (1 - F_X(z/m))}~.$

We can now apply one of the fundamental inequalities of information theory to the equation above: $\ln x \leq x - 1$, which can also be written as $\ln x \geq 1 - 1/x$. By monotonicity of the exponent, our inequality gives the upper and lower bounds

$e^{-\frac{m F_X(z/m)}{1 - F_X(z/m)}} \leq \mathbb{P}(Z_m \geq z) \leq e^{-m F_X(z/m)}~.$

Note that the denominator in the exponent of the lower bound $1 - F_X(z/m)$ converges to $1$ as $m \to \infty$. Furthermore, we see that as $m \to \infty$, the quantity $m F_X(z/m)$ approaches $z f_X(0)$, which results in the exponential limit of the inter-arrival distribution of both bounds. If we examine this quantity more closely, $m$ dilates $m F_X(z/m)$ so that any modes in the distribution are effectively pushed closer together around $0$, and it’s directly connected to the fact that we scale the $\min$ operation by the multiplicative factor $m$.

## High Throughput

In late 2011, I gave cheek swabs to National Geographic to trace my genetic genealogy. The samples looked at markers from my Y chromosome and mitochondrial DNA: the first was passed down from my father from his father from his father up through the generations, and the second from my mother from her mother from her mother etc. The markers would help me learn about those ancestral lines. It took a couple months for the results to come back, and this puzzled me.

A few years prior, I was at a center that was sequencing DNA at surprisingly fast rates, so I couldn’t understand why my sequences would take so long. I took follow-up tests with Family Tree DNA to better understand my genetic genealogy, but these were just as slow.

Meanwhile, the fast sequencing that I had observed was being commercialized by companies like Illumina, which provide “high throughput” sequencing technology. One well-known technique involves reading fragments of DNA and reassembling it into the original genome in a manner akin to solving a jigsaw puzzle. A couple nights ago, I was having dinner with a friend who worked at one of these high throughput sequencing companies, and I asked how “high throughput” these jigsaw-puzzle solvers are.

“The throughput is roughly one genome per day,” he responded. The entire genome? Yup. Even eukaryotes? Yup. And they handle multiple chromosomes? Yup. Wouldn’t my tests only take a tiny fraction of that time? Yup. Were the companies I sent this to using high throughput sequencing technology? Yup. Then what was going on?

My tests amounted to genotype matching, which my friend confirmed was a simpler process and would take less time; however, the technology requires samples to be prepped according to the type of test. While the prep work itself may only take a few hours, running a test on my sample simply isn’t cost-effective by itself. In order to achieve economies of scale, it would be best to batch my test in with similar ones.

“Why not just sequence the entire genome and do post-processing in the cloud?” This, it turned out, was the holy grail, and companies were working on new ways to improve the speed.

Were they simply building faster jigsaw-puzzle solvers? No. The algorithms for assembling the genomes were fairly mature, so the real gains have come from finding ways to set up the prep so that it can provide side information in the later assembly. In one such example, Illumina had acquired the start-up Moleculo, which had found a clever way to barcode the preps for faster assembly. Moleculo consisted of a mix of molecular biologists and computer scientists, with the innovations happening at the interface between the fields.

## Passing Notes in Class

I just got back from a four-day camping trip for the Fourth. While roasting marshmallows and hiking along trails, I managed to fall into a few puzzles. Justin posed a puzzle that we’d formulated a while ago: a multi-person variation on the unreliable postal service puzzle. At some later point, I brought up The League Problem. On the ride back, Amit mentioned the water jug puzzle and tried explaining his prior work on inaccessible cardinals to me.

I came home to good news about some of my prior work. Specifically, a puzzle that I like to call Passing Notes in Class had been posted while I was away: Alice and Bob want to pass notes to each other in class, but they don’t want the teacher to notice (or at the very least, not catch them very often). If Alice and Bob can both see how much the teacher is paying attention to them, and the rustling of notes between the two will make the teacher more likely to look their way in the future, what is the optimal tradeoff between passing as much information as possible between the two of them while limiting the number of times they get caught by the teacher?

## Fool Me Once

I recently started watching Penn and Teller’s Fool Me, a show in which magicians try to fool Penn and Teller for a shot to perform in Vegas. After one of the episodes, there was a trick that I couldn’t even begin to understand, so I went looking for it online. Instead of getting a definitive answer, I found a web page with a bunch of guesses that had degenerated into a flame war. The flame war itself reminded me of an xkcd comic, but one of the comments mentioned a separate trick that had fooled Penn and Teller, a supposed mathematical trick performed by Graham Jolley.

The comment didn’t describe Graham Jolley’s trick, but it was pretty easy to find and watch. After experimenting with a deck of cards along with a pencil and some scratch paper, I’ve come up with one possible implementation.

The Effect

The magician takes out a deck of cards and has two participants sit on each side. Participant A is asked to cut about a third of the way into the deck and look at the bottom card. Participant B is asked to cut about half way into the remaining deck and look at the bottom card. The magician then recollects the card stack from Participant A and Participant B, placing them on top of the deck, respectively.

The magician then spreads the deck face down to reveal two jokers facing up. The jokers are taken out, and the magician holds each one close, as if it were transmitting where in the deck to look. In fact, the magician states that the first card will be found $A$ cards into the deck, and the other will be found $B$ cards in. When the magician counts through the deck and sets aside the cards at positions $A$ and $B$, they turn out to be the ones for Participant A and Participant B, respectively.

One Possible Method

The method I came up with assumes that Participant A will draw a card roughly 1/3 of the way into the deck, and Participant B draws a card roughly halfway into the remaining deck or roughly 2/3 of the way into the original deck.

Let’s start with a deck containing $N$ cards (not counting the jokers) with the first joker between cards $i$ and $i+1$, and the second between cards $j$ and $j+1$:

$1, \ldots, i, \|JOKER\|, i+1, \ldots, j, \|JOKER\|, j+1, \ldots, N$

We’ll now proceed with some definitions and assumptions

$N_A$ – the initial position of the card drawn by Participant A
$N_B$ – the initial position of the card drawn by Participant B
$i < N_A < j < N_B$

This would mean that the arrangement of the deck would be as follows:

$1, \ldots, i, \|JOKER\|, i+1, \ldots, N_A, \ldots, j, \|JOKER\|, j+1, \ldots, N_B, \ldots, N$

The magician achieves this by having Participant A get a card from near 1/3 of the way into the deck and placing the first joker well before 1/3 of the way in. Likewise for the second joker ($j = N/2$ would be the safest choice).

When Participants A and B replace their stacks on the top of the deck, Participant A goes first, so the order upon this reshuffle looks as follows:

$N_{A}+1, \ldots, j$, $\|JOKER\|, j+1, \ldots, N_B$, $1, \ldots, i$, $\|JOKER\|, i+1, \ldots, N_A$, $N_{B}+1, \ldots, N$

Note that the first group of cards before the joker

$N_{A}+1, \ldots, j$

contains $j - N_A$ cards. The second group of cards (between the first and second joker)

$j+1, \ldots, N_B, 1, \ldots, i$,

contains Participant B’s card $i$ positions from the end, and the third group

$i+1, \ldots, N_A, N_{B}+1, \ldots, N$

contains Participant A’s card $N_A - i$ cards from the beginning.

Thus, by moving the third group between the first and third (this happens when removing the jokers), Participant A’s card lands $j - i$ positions from the start, and Participant B’s lands $i$ cards from the end.

For a standard deck, $N = 52$, let $j = 26$, $i = 6$, and Participant A’s card will land at position 20, and Participant B’s will land at position 46.

## Complexity and Asymptotes

A friend pointed out to me that the statement that the final divide-and-conquer Fibonacci algorithm from the previous post could run in $O(\log N)$ time was a bit misleading. The objection was that I had assumed that the matrix multiplication would not depend on $N$, but that this in fact not the case at all. Namely, if the numbers being added get large, neither the addition nor the multiplication operations of the matrix multiplication cannot be assumed to be constant.

In fact, the earlier analysis of Fibonacci indicated that the value of $\text{fib}(N)$ could be bounded above and below by $2^{N/2} < \text{fib}(N) < 2^{N}$, so each of the roughly $\log N$ matrix multiplication amounts to 10 multiplication operations and 4 addition operations, which we can bound above as the addition or multiplication on order-of- $N$ -bit numbers. By contrast, the iterative algorithm, which I had claimed was linear-time $O(N)$, requires only one addition operation at each step, resulting in an order of $N$ addition operations on order-of- $N$ -bit numbers.

If we consider the complexity of the arithmetic operations, the addition of two $N$ bit numbers can be completed in $O(N)$ time, so the revised complexity of the iterative algorithm becomes $O(N^2)$.

For the divide-and-conquer algorithm, we have to consider the complexity of both multiplication operations and addition operations. Suppose we opt for the Schönhage–Strassen algorithm, which has complexity $O(N \log N \log \log N)$. In this case, the multiplication operations dominate the addition ones, and the revised overall complexity of the algorithm becomes $O(N \log^2 N \log \log N)$, which is still faster than the $O(N^2)$ iterative algorithm asymptotically. However, I wonder if there is an intermediate region before that asymptotic behavior takes effect in which the iterative approach completes faster.

## Code Monkey

During my years as a student, I sometimes encountered a disdain in others for writing code. In some cases, the term “code monkey” would get used against someone who enjoyed writing code. Something never quite felt right about that term. Recently, I went to a talk by Donald Knuth, who said something that resonated with me. I am paraphrasing it here: “Some people say that you truly understand something when you can explain it to a child. I’d say that you truly understand something when you can explain it to a computer.”

I took a programming class in high school during which we had to write a program to compute the Fibonacci sequence: 1, 1, 2, 3, 5, 8, 13, etc. The way to get the next number in the sequence was to add the previous two numbers in the sequence. In my mind, one that had just digested the idea of recursion, this was relatively simple to explain to a computer:

int fib(int N) {
if (N == 1) return 1;
if (N == 2) return 2;
return fib(N - 1) + fib(N - 2);
}

There was only one problem with my explanation: it would take a computer exponential time in $N$ to run the code, denoted as $O(2^N)$. One way to see it is to notice that the number of calls to the function doubles for every $N > 2$, resulting in a binary tree with the minimum depth to a leaf of $\frac{N}{2}$ and a maximum depth of $N$. Another way to see it is to compile the code and use it to try to compute $\text{fib}(50)$ and then wait… and wait… and wait.

The Fibonacci sequence grows exponentially, and a naive recursion effectively counts up one-by-one. Thus, such an algorithm grows exponentially, as well.

I got tired of waiting, and after studying computer science a while longer, I learned how to explain Fibonacci to a computer that would take linear time, i.e. $O(N)$:

int fib(int N) {
int a = 0;
int b = 1;
for (int i = 1; i < N; ++i) {
int temp = b;
b += a;
a = temp;
}
return b;
}

Now I didn’t have to wait for $\text{fib}(50)$, but I would for $\text{fib}(2^{50})$. This improved my understanding of the Fibonacci sequence, but at the time, my understanding was quite limited: I hadn’t taken a linear algebra class yet. Once I had that and some signal processing classes under my belt, I learned that the Fibonacci sequence could be solved in closed-form with a z-transform. Alternatively, one could arrive at the closed-form solution by recognizing that the Fibonacci sequence could be represented as the product of an exponentiated matrix with a vector, and all one had to do was decompose that matrix into its eigenvalues/eigenvectors:

$\left(\begin{array}{c} \text{fib}(N+1) \\ \text{fib}(N) \end{array}\right)$ = $\left(\begin{array}{cc} 1 & 1 \\ 1 & 0 \end{array}\right)^N \left(\begin{array}{c} 1 \\ 0 \end{array}\right)$

There was still a problem, though. The eigenvalues of the Fibonacci matrix are irrational, and the closed-form solution contains irrational terms, so explaining this to a computer could result in floating point errors:

$\text{fib}(N) = \frac{(\frac{1 + \sqrt{5}}{2})^N - (\frac{1 - \sqrt{5}}{2})^N}{\sqrt{5}}$

By this time, I was well versed in MATLAB, and this wonderful vector-matrix notation could be fed right in, but perhaps I didn’t understand what was going on as well as I thought.

Some years went by before I took a closer look at the above expression and noticed something: the computation of the closed-form expression involves two quantities that need to be exponentiated, which can be done using a divide-and-conquer algorithm in $O(\log N)$, i.e. logarithmic time. Is there a way to do this that doesn’t involve floating-point arithmetic?

It turns out the approach is pretty simple: one can apply the same divide-and-conquer technique for exponentiating a number to exponentiating a matrix. The divide step is easy to see when $N$ is even:

$\left(\begin{array}{c} \text{fib}(N+1) \\ \text{fib}(N) \end{array}\right)$ = $\left(\left(\begin{array}{cc} 1 & 1 \\ 1 & 0 \end{array}\right)^2\right)^{N/2} \left(\begin{array}{c} 1 \\ 0 \end{array}\right)$

If N is odd, the same idea holds:

$\left(\begin{array}{c} \text{fib}(N+1) \\ \text{fib}(N) \end{array}\right)$ = $\left(\begin{array}{cc} 1 & 1 \\ 1 & 0 \end{array}\right)\left(\left(\begin{array}{cc} 1 & 1 \\ 1 & 0 \end{array}\right)^2\right)^{(N-1)/2} \left(\begin{array}{c} 1 \\ 0 \end{array}\right)$

Given this, it isn’t too hard to explain to a computer how to compute the Fibonacci sequence in $O(\log N)$ time without resorting to floating-point arithmetic:

int fib(int N) {
Matrix<int> m(1,1,1,0);
Vector<int> v(1,0);
Vector<int> v = fib_helper(N, m, v);
return v[0];
}
Vector fib_helper(int N, Matrix<int> m, Vector<int> v) {
// Base case
if (N == 1) {
// Multiples the matrix and the vector.
return MultiplyMatrixVector(m, v);
}
// Even case
if (N % 2 == 0) {
// Squares the matrix.
return fib_helper(N / 2, SquareMatrix(m), v);
}
// Odd case
return MultiplyMatrixVector(
m, fib_helper(N - 1, m, v));
}

Are there any cases in which programming has helped you understand a concept better? Are there any cases in which understanding a concept has made you a better programmer?

Posted in Programming, Signal Processing | 1 Comment