Complexity and Asymptotes

A friend pointed out to me that the statement that the final divide-and-conquer Fibonacci algorithm from the previous post could run in O(\log N) time was a bit misleading. The objection was that I had assumed that the matrix multiplication would not depend on N, but that this in fact not the case at all. Namely, if the numbers being added get large, neither the addition nor the multiplication operations of the matrix multiplication cannot be assumed to be constant.

In fact, the earlier analysis of Fibonacci indicated that the value of \text{fib}(N) could be bounded above and below by 2^{N/2} < \text{fib}(N) < 2^{N}, so each of the roughly \log N matrix multiplication amounts to 10 multiplication operations and 4 addition operations, which we can bound above as the addition or multiplication on order-of- N -bit numbers. By contrast, the iterative algorithm, which I had claimed was linear-time O(N), requires only one addition operation at each step, resulting in an order of N addition operations on order-of- N -bit numbers.

If we consider the complexity of the arithmetic operations, the addition of two N bit numbers can be completed in O(N) time, so the revised complexity of the iterative algorithm becomes O(N^2).

For the divide-and-conquer algorithm, we have to consider the complexity of both multiplication operations and addition operations. Suppose we opt for the Schönhage–Strassen algorithm, which has complexity O(N \log N \log \log N). In this case, the multiplication operations dominate the addition ones, and the revised overall complexity of the algorithm becomes O(N \log^2 N \log \log N), which is still faster than the O(N^2) iterative algorithm asymptotically. However, I wonder if there is an intermediate region before that asymptotic behavior takes effect in which the iterative approach completes faster.

Posted in Programming | Leave a comment

Code Monkey

During my years as a student, I sometimes encountered a disdain in others for writing code. In some cases, the term “code monkey” would get used against someone who enjoyed writing code. Something never quite felt right about that term. Recently, I went to a talk by Donald Knuth, who said something that resonated with me. I am paraphrasing it here: “Some people say that you truly understand something when you can explain it to a child. I’d say that you truly understand something when you can explain it to a computer.”

I took a programming class in high school during which we had to write a program to compute the Fibonacci sequence: 1, 1, 2, 3, 5, 8, 13, etc. The way to get the next number in the sequence was to add the previous two numbers in the sequence. In my mind, one that had just digested the idea of recursion, this was relatively simple to explain to a computer:

int fib(int N) {
  if (N == 1) return 1;
  if (N == 2) return 2;
  return fib(N - 1) + fib(N - 2);
}

There was only one problem with my explanation: it would take a computer exponential time in N to run the code, denoted as O(2^N). One way to see it is to notice that the number of calls to the function doubles for every N > 2, resulting in a binary tree with the minimum depth to a leaf of \frac{N}{2} and a maximum depth of N. Another way to see it is to compile the code and use it to try to compute \text{fib}(50) and then wait… and wait… and wait.

The Fibonacci sequence grows exponentially, and a naive recursion effectively counts up one-by-one. Thus, such an algorithm grows exponentially, as well.

I got tired of waiting, and after studying computer science a while longer, I learned how to explain Fibonacci to a computer that would take linear time, i.e. O(N):

int fib(int N) {
  int a = 0;
  int b = 1;
  for (int i = 1; i < N; ++i) {
    int temp = b;
    b += a;
    a = temp;
  }
  return b;
}

Now I didn’t have to wait for \text{fib}(50), but I would for \text{fib}(2^{50}). This improved my understanding of the Fibonacci sequence, but at the time, my understanding was quite limited: I hadn’t taken a linear algebra class yet. Once I had that and some signal processing classes under my belt, I learned that the Fibonacci sequence could be solved in closed-form with a z-transform. Alternatively, one could arrive at the closed-form solution by recognizing that the Fibonacci sequence could be represented as the product of an exponentiated matrix with a vector, and all one had to do was decompose that matrix into its eigenvalues/eigenvectors:

\left(\begin{array}{c} \text{fib}(N+1) \\ \text{fib}(N) \end{array}\right) = \left(\begin{array}{cc} 1 & 1 \\ 1 & 0 \end{array}\right)^N \left(\begin{array}{c} 1 \\ 0 \end{array}\right)

There was still a problem, though. The eigenvalues of the Fibonacci matrix are irrational, and the closed-form solution contains irrational terms, so explaining this to a computer could result in floating point errors:

\text{fib}(N) = \frac{(\frac{1 + \sqrt{5}}{2})^N - (\frac{1 - \sqrt{5}}{2})^N}{\sqrt{5}}

By this time, I was well versed in MATLAB, and this wonderful vector-matrix notation could be fed right in, but perhaps I didn’t understand what was going on as well as I thought.

Some years went by before I took a closer look at the above expression and noticed something: the computation of the closed-form expression involves two quantities that need to be exponentiated, which can be done using a divide-and-conquer algorithm in O(\log N), i.e. logarithmic time. Is there a way to do this that doesn’t involve floating-point arithmetic?

It turns out the approach is pretty simple: one can apply the same divide-and-conquer technique for exponentiating a number to exponentiating a matrix. The divide step is easy to see when N is even:

\left(\begin{array}{c} \text{fib}(N+1) \\ \text{fib}(N) \end{array}\right) = \left(\left(\begin{array}{cc} 1 & 1 \\ 1 & 0 \end{array}\right)^2\right)^{N/2} \left(\begin{array}{c} 1 \\ 0 \end{array}\right)

If N is odd, the same idea holds:

\left(\begin{array}{c} \text{fib}(N+1) \\ \text{fib}(N) \end{array}\right) = \left(\begin{array}{cc} 1 & 1 \\ 1 & 0 \end{array}\right)\left(\left(\begin{array}{cc} 1 & 1 \\ 1 & 0 \end{array}\right)^2\right)^{(N-1)/2} \left(\begin{array}{c} 1 \\ 0 \end{array}\right)

Given this, it isn’t too hard to explain to a computer how to compute the Fibonacci sequence in O(\log N) time without resorting to floating-point arithmetic:

int fib(int N) {
  Matrix<int> m(1,1,1,0);
  Vector<int> v(1,0);
  Vector<int> v = fib_helper(N, m, v);
  return v[0];
}
Vector fib_helper(int N, Matrix<int> m, Vector<int> v) {
  // Base case
  if (N == 1) {
    // Multiples the matrix and the vector.
    return MultiplyMatrixVector(m, v);
  }
  // Even case
  if (N % 2 == 0) {
    // Squares the matrix.
    return fib_helper(N / 2, SquareMatrix(m), v);
  }
  // Odd case
  return MultiplyMatrixVector(
      m, fib_helper(N - 1, m, v));
}

Are there any cases in which programming has helped you understand a concept better? Are there any cases in which understanding a concept has made you a better programmer?

Posted in Programming, Signal Processing | 1 Comment

Voting Paradox

Inspired by the upcoming elections, I spent a little time yesterday trying to think up an example in which people could potentially have logically consistent beliefs individually but as a whole produce logically inconsistent outcomes. The result of that effort follows.

It’s election time, and there are three propositions on the ballot to spend a budget surplus. Proposition 1 is to increase funding for education. Proposition 2 is to increase funding for the healthcare. However, if both Propositions 1 and 2 pass, the tax rate needs to increase to 8% to avoid a budget shortfall. Proposition 3 is designed to do just this.

There are three voters in the town. Alice is for education only, so she supports Proposition 1 but not 2 or 3. Bob is for healthcare only, so he supports Proposition 2 but not 1 or 3. Cindy wants both education and healthcare, so she supports Propositions 1, 2, and 3. While everyone believes in something logically consistent, in this scenario, both Propositions 1 and 2 pass, but Proposition 3, the tax increase, is defeated, leading to a budget shortfall.

I posed the problem above to Justin Bledin, a graduate student in the Logic Group at UC Berkeley, to find out if the idea made sense or not.

“If you’d stumbled upon this ten years ago, it would have made a nice paper,” Justin responded before pointing me to the judgment aggregation paradox, something that he had come across in a decision theory seminar.

In 2002, Christian List and Philip Pettit‘s “Aggregating Sets of Judgments: An Impossibility Result” was published in Economics and Philosophy. The paper starts with an example similar in flavor to the one above and goes on to prove that a voting function will produce logically inconsistent output for certain logically consistent profile of inputs if the voting function satisfies the following three conditions:

  1. The voting function should accept any individual’s voting profile if it satisfies certain conditions for logical consistency.
  2. The output of the voting function should be the same for any permutation of the individual voting profiles.
  3. If two propositions have the same votes in favor, then their outcome should be the same.

The paper concludes with strategies that could produce one a consistent voting function if one of the rules were relaxed. One idea that comes out of the second theorem of the paper is a median-based voting method, so long as there is a way to order individual voting profiles. It would be interesting to think about how one might construct such voting systems in practice.

Posted in Papers, Puzzle | Leave a comment

Walking Downhill

The problem asks you to show that f(\vec{x}) \geq 0. If f(x_1,x_2) = x_1^2 + x_2^2, then the solution is quite easy. Set the gradient equal to zero and solve the system of equations for \vec{x}. Since the function is convex, this is the minimum point, and the answer is simply

f(\vec{x}) \geq f(0,0) = 0.

However, one can add a wrinkle to this problem to make it more of a challenge. If the f(x_1, x_2) = \log (1 + x_1^2 + x_2^2), then we no longer have a convex function. However, it can be shown that this function is minimized at \vec{x} = \vec{0}, and there is a surprisingly general argument that works and has been used to considerable effect in the literature.

The argument can be summarized intuitively as follows: show that from any point, there is a downward path to the minimum. For instance, in the above problem, one can define g(t) as follows:

g(t) = f(x_1 \cdot (1 - t), x_2 \cdot (1 - t)).

Then, it is straightforward to show that over the interval [0,1], g'(t) \leq 0 for either choice of f(x_1, x_2) above. Furthermore, g(0) = f(x_1, x_2) and g(1) = 0, so we can conclude that f(\vec{x}) \geq 0.

Entropy Power Inequality

As mentioned before, this type of argument has proved to be potent in papers. One of the first places I encountered the argument was in the proof of the entropy power inequality. There are several equivalent statements of the result (see e.g. Dembo et al. 1991), one of which is the following. Given two independent random variables X and Y with differential entropies h(X) and h(Y), respectively, then

h(X + Y) \geq h(\tilde{X} + \tilde{Y}),

where \tilde{X} and \tilde{Y} are independent Gaussian random variables with the same differential entropies as X and Y, respectively. The entropy power inequality has been used to prove converses for Gaussian broadcast channels and the quadratic Gaussian CEO problem.

The proof essentially involves transforming the distributions of X and Y to the distributions of \tilde{X} and \tilde{Y} along a path that does not increase the differential entropy of the sum.

Parameter Redundancy of the KT-Estimator

I’ll end with another use of the argument found in Willems, Shtarkov, and Tjalken’s “The Context-Tree Weighting Method: Basic Properties” from the May 1995 issue of the IEEE Transactions on Information Theory. Many of the results that follow from the paper make use of the Krichevski-Trofimov (KT) estimator, which approximates the probability distribution of a binary sequence based on the number of ones and zeroes. With the above argument, one can show that the parameter redundancy of a KT-estimator can be uniformly bounded. To state this mathematically, first let P_e (a, b) represent the KT-estimator for a sequence with a ones and b zeroes. Also note that for a Bernoulli sequence with parameter \theta, the probability the sequence has a ones and b zeroes is (1 - \theta)^a \cdot \theta^b. The result states that for all values of \theta,

\log \frac{(1 - \theta)^a \cdot \theta^b}{P_e (a, b)} \leq \frac{1}{2} \log (a + b) + 1.

Note that the upper bound does not depend on \theta. This result is a key component for the authors to show that the redundancy of the context-tree weighting method is small and thereby demonstrate they have a compelling strategy for universal source coding.

The uniform bound above follows from a lower bound on the KT-estimator:

P_e (a, b) \geq \frac{1}{2} \cdot \frac{1}{\sqrt{a + b}} \cdot (\frac{a}{a+b})^a \cdot (\frac{b}{a+b})^b.

To prove the result, the authors define

\Delta(a, b) = \frac{P_e (a, b)}{\frac{1}{\sqrt{a + b}} \cdot (\frac{a}{a+b})^a \cdot (\frac{b}{a+b})^b}

and find a downward path to show that \Delta(a, b) \geq \Delta(1,0) = \Delta(0,1) = \frac{1}{2}.

Posted in Information Theory, Papers | Leave a comment

4 Prisoners

The end of the 100 Prisoners post asked if there is a way to show that there does not exist a strategy that meets the coupon collector lower bound for release when there are N > 3 prisoners.

Let’s first establish strategies for N \leq 3. Note that for N = 1, the prisoner can declare victory on the first day, which trivially meets the coupon collector lower bound. Similarly, for N = 2, the first time a new prisoner enters after the first day, the prisoner can declare victory since there is only one other prisoner, who entered on the first day. Again, this trivially meets the coupon collector bound.

For N = 3, we actually need to use the light switch. On day 1, the first prisoner turns the light switch off. The next new prisoner (second prisoner) to enter turns the light switch on. The next new prisoner (third prisoner) to see the light switch on declares victory. Once again, this meets the coupon collector lower bound.

Why don’t we have luck for N = 4? We can arrive at it by contradiction: suppose there is a strategy that meets the coupon collector lower bound. Note that this requires the fourth prisoner to be able to determine based on the time of first entry and by looking at the light bulb whether or not he is the fourth prisoner. Without the light switch, all a new prisoner knows for any time t \geq 4 days is that he is not the first prisoner. Thus, if such a strategy should work, for t \geq 4, the light switch should uniquely identify whether or not three other prisoners have visited the room or not.

Without loss of generality, for t \geq 4, the switch will be on if and only if three of the prisoners have visited the room already. Suppose a prisoner enters the room for the first time on some day t \geq 4. If the light switch is off, then the prisoner must set the switch for the next day. However, the only information available to the prisoner is t and the position of the switch, which indicates that either one or two prisoners have visited the room previously. If the prisoner sets the switch to on, and only one prisoner had visited the room before, the switch was set incorrectly. On the other hand, if the prisoner sets the switch to off, and two prisoners had visited the room before, the switch was also set incorrectly. Thus, we have arrived at a contradiction, and no strategy can achieve the coupon collector lower bound. Calculating an expected lower bound based on this argument and extending the argument to all N > 3 are left as exercises.

While the above argument indicates that the coupon collector lower bound is not tight for N = 4, it does not say whether the strategy given in the previous post is the best one can do. In fact, it is not, and what follows is a better strategy. Now, the light switch is used to indicate whether there have been an even or odd number of visitors to the room. The first visitor to the room switches it on, and on any prisoner’s first visit to the room thereafter changes the state of light switch:

1st prisoner changes the switch to ‘on’
2nd prisoner enters for the first time with the switch ‘on’ and changes it to ‘off’
3rd prisoner enters for the first time with the switch ‘off’ and changes it to ‘on’
4th prisoner enters for the first time with the switch ‘on’ and changes it to ‘off’

Note that since the third prisoner is only person to see the light switch off on his first time in the room, he can uniquely identify that there is only one prisoner left, and the next time he enters the room and sees the switch is off, he can declare victory. Likewise, if the first or second prisoner reenter the room after the third prisoner and before the fourth prisoner, then he can also figure out that there is only one prisoner left and can declare victory the next time he sees the switch off. It turns out the probability that both of them visit, which results in an expected time to release of

4/3 + \mathbb{E} [coupon collector],

is 1/3; the probability only one of then visits, which results in an expected time to release of

2 + \mathbb{E} [coupon collector],

is 1/3; and the probability that neither visits, which results in an expected time to release of

4 + \mathbb{E} [coupon collector],

is 1/3. Thus, the expected time to release is \frac{22}{9} + \mathbb{E} [coupon collector], where \mathbb{E} [coupon collector] = \frac{25}{3}. This drops the expected time after coupon collector from 12 to about 2.4. Of course, we are taking advantage of the the fact that the number of prisoners is so small. I suspect it will be more difficult to make such pronounced improvements over the earlier strategy for larger N.

Posted in Probability, Puzzle | Leave a comment

Shannon Meets Shannon

He’s met almost everyone else: Wiener, BodeBellmanCarnot, Tesla, Marconi, and of course, Shortz. Bad jokes aside, in an attempt to understand the inverse water filling solution from rate-distortion theory better, I put together some rough notes attempting to connect it and the sampling theorem. It’s kind of old school but makes an interesting exercise for students of information theory and signal processing. There are almost certainly places in the notes where the descriptions could be stated better, and I haven’t thoroughly scrubbed it for typos, so any feedback is both welcomed and encouraged.

sampling-inverse-water-filling v1.0

Posted in Information Theory, Signal Processing | Leave a comment

100 Prisoners

100 prisoners are condemned to life in prison, or so they think. One day the warden assembles all of the prisoners together and offers them a deal: “Starting tomorrow, I will select a prisoner at random every day and send him to a room with a lightbulb and switch. The prisoner may choose to turn the light on or off and must then leave. Now, here’s the deal: if, after visiting the room, a prisoner is convinced that all other prisoners have visited the room at least once, he may say so. If he is right, you will all be freed. If he is wrong, you will all be executed. After tonight, you will not be able to see or contact each other ever again, so devise a strategy now.” Morbidness aside, what strategy can the prisoners devise that guarantees their freedom?

After a friend first posed it to me, I’ve reasked this to several people over the years. Most who come up with a solution leave it that, but my uncle was not one of them.

“It’s going to take them too long to get out,” he noted. I sympathized, but somehow I wasn’t convinced there was a better solution, either.

Before getting to that solution and the time to release, how long would it actually take for the all the prisoners to visit the room at least once? The answer lies in the coupon collector’s problem. Given N prisoners, the time T for all of them to visit the room at least once can be expressed as the sum of geometric random variables T = T_1 + T_2 + \cdots T_N, where T_i is the time for ith new person to enter the room after the i-1th new person has been there. By linearity of expectation, the expected time for all prisoners to visit the cell at least once can then be expressed as follows:

\mathbb{E}[T] = \sum_{i=1}^N \mathbb{E}[T_i] = \sum_{i=1}^N \frac{N}{N+1-i} = N \cdot \sum_{i=1}^N \frac{1}{i} ~.

For N = 100, \mathbb{E}[T] \approx 519 days, so the expected time for everyone to visit the room at least once would be less than two years.

Now what about for our solution? First, what was our solution? It works as follows. The person who enters on the first day becomes the monitor, who is the only one allowed to turn the light switch off. Everyone else is allowed either to turn the switch on or leave it as is. The goal is for the monitor to count the number of people who have visited the room at least once by the number of times he enters the room and the switch is on. To do this, the remaining prisoners follow the following protocol: if the prisoner has never turned the switch on and sees it off, he turns it on; otherwise, he leaves it as is. This protocol prevents the monitor from overcounting the number of prisoners who have. Thus, once the monitor has entered the room with the switch on N-1 times (the monitor can ignore his first visit and automatically count himself), he can claim with certainty that each prisoner has visited the room at least once. However, once a switch is turned on, prisoners who have yet to visit the room must wait until the monitor visits the room again before they are allowed to indicate their entry, thereby delaying the monitors final announcement.

How long exactly does it take? Again, we can proceed by a sum of geometric random variables. Let U be the day the monitor announces every prisoner has been to the room at least once, which we express as the sum

U = T_1 + V_1 + M_1 + V_2 + M_2 + \cdots + V_{N-1} + M_{N-1}.

Here, M_i represents number of days between the ith time light is turned on, and the monitor’s next visit to the room. This is a geometric random variable with expectation \mathbb{E}[M_i] = N. Similarly, V_i represents the number of days between the monitor’s last visit and the ith prisoner that can turn on the switch for the first time. This is a geometric random variable with expectation \mathbb{E}[M_i] = \frac{N}{N - i}. Finally, T_1 is defined as before, and it is clear that T_1 = 1 day since whoever enters on the first day is automatically a first-time visitor to the room. By linearity of expectation, the expected time for the prisoners’ release is given as follows:

\mathbb{E}[U] = N \cdot \sum_{i=1}^N \frac{1}{i} + N \cdot (N-1)~.

For N = 100, \mathbb{E}[U] \approx 10,419 days, which is over 28 years! By that time, the warden will likely have retired and been replaced by a new warden who doesn’t respect the deal.

I have a bit more to say about the problem, but for now, I’ll leave you with a couple problems to consider.

  1. Show that for N > 3, there exists no strategy for which the expected time to release is equal to the expected time to collect N coupons.
  2. Find a nontrivial lower bound in terms of N for the minimum expected time to release.
Posted in Probability, Puzzle | 3 Comments