## Just Nuts

My Cuisinart ice cream maker had arrived only a few days before the e-mail did. I did a double-take when I saw the subject line:

trying to track down Nathan Kurz

Nathan Kurz? The Nathan Kurz? The Nathan Kurz of Scream sorbet? Yes, that one. The e-mail was from Charlotte Druckman, the reporter who had written about Scream for the New York Times. The article told the story of Nathan Kurz (yes, that one!), the computer programmer turned ice cream genius who had brought the creaminess of ice cream to the dairy-free world using nothing but the fat content of nuts. His Scream sorbet had built a devoted following in the Bay Area only to close its doors a few years later.

I was one of their devoted, and last Thanksgiving, I’d written a blog post about reviving their signature pistachio sorbet recipe. Since then, I’d been making new flavors, recreating old ones, and sharing the results at parties. It was all thanks to a recipe that Nathan Kurz had shared with Joy of Blending. I actually hadn’t seen or spoken with him since Scream had closed its doors, and this newbie to the ice cream world had a bunch of questions for him.

Charlotte Druckman had questions for him, too, and when she discovered my blog, figured she’d check if I had his contact info. I wanted the answer to be yes, so before responding, I got in touch with a friend in the food industry who might. While waiting for a response, I searched the Internet and came across a GitHub account by someone named Nathan Kurz.

I compared the account details with what I knew from the New York Times article. This Nathan Kurz had started the account in 2009, shortly before the birth of Scream sorbet but only started checking in code in late 2013, a few months after Scream had closed its doors. This Nathan Kurz was interested in CPU caches, and another Google search turned up a blog post he’d written about it. This Nathan Kurz’s blog included an e-mail address, so I sent him an e-mail asking if he was also the Nathan Kurz of Scream sorbet.

A response came within an hour:

Hi Krish,

Yes, you’ve got the right me.

## Reviving Scream Sorbet

My unhealthy obsession with Scream Sorbet began at the Grand Lake Farmers’ Market in Oakland, where after a run around Lake Merritt in late 2010, I rewarded myself with a scoop of Nathan Kurz’s pistachio sorbet. While I’m a fan of ice cream and pistachios separately, pistachio ice cream is far from my favorite, but Kurz’s scoop offered something different: all the creaminess of regular ice cream and all the flavor of pistachio undiluted by cream. The creaminess came from the fat in pistachio, which made it unlike the fruit-based sorbets I’d tried in the past. It was delicious, and the ingredients on a jar looked ridiculously simple: pistachios, water, sugar, and salt.

When Scream Sorbet opened a store front in the Temescal a few months later, I looked for any excuse to go, evangelizing it to friends and visitors alike. Like a Teddy Ruxpin that had just been turned on, my eyes would open wide as I described the magic of nut-based sorbets and attempted to win over converts. I would even share an article in the New York Times about Scream to entice skeptics to give it a try.

I wasn’t expecting such an abrupt end when Scream closed its doors in 2013, and despite a one-off pop-up at Bittersweet Cafe a few months later, the sorbet never found a home elsewhere or returned to farmers’ markets. It was reduced to just a memory. While nut-based sorbets found their way to grocery stores, I never found a pistachio sorbet that matched the taste of Scream’s.

On the face of it, it felt like it should be relatively simple to make. I had access to its four ingredients, and the Times article went into some detail about getting the proportions of fat to sugar to solid to liquid correct, as well as the type of blender Kurz used in his process, but without the numbers. The question was what those proportions were, so I searched online. It turned out that Kurz had shared his pistachio sorbet recipe, so I set out to revive the sorbet as a dessert following Thanksgiving dinner.

I was at my cousin’s for Thanksgiving, whose husband incidentally had first shown me the Times article. He became my partner in crime, and we combined the ingredients using a blender that featured none of the bells and whistles of a Pacojet or a Vitamix. I tried a spoonful of the blend, and the flavor that I thought was lost to history was in the spoon. I gave a sample to my cousin-in-law, whose eyes lit up like Teddy Ruxpin.

Suffice it to say that the pistachio sorbet was a hit, and we wanted to try it again the next day. We were out of pistachios, but my cousin had walnuts at home, which contain a greater percentage of fat than pistachios, so we combined it with some papayas, running the numbers to maintain the proportions of solid to liquid (papaya’s are roughly 91% water) to fat to sugar so that they roughly matched up with with the pistachio recipe. The result was a papya-walnut sorbet that was as creamy as ice cream. We had cracked the code!

For our batch, the cost of making the pistachio sorbet was roughly $1.50 a scoop, and I think Scream used to charge$3 or $4 a scoop, but the per scoop costs don’t tell the entire story. Based on some back-of-the-envelope calculations and the conservative estimate that retail space in the Temescal is$5k a month, the business needed at least one employee, e.g. the owner himself (in practice there were more) paid for a 40 hour work-week at $15/hour (i.e. minimum wage), Scream would have needed to sell nearly 20 scoops an hour to break even charging$4 a scoop or over 30 an hour charging \$3 a scoop. During many of the times I was there, Scream was not pulling in anywhere near that kind of traffic.

On the other hand, if it were a dessert item in a restaurant, the expenses could be offset by other items on the menu. In the meantime, I’ve found a workaround.

Posted in Uncategorized | 2 Comments

## Translation

The idea of learning by translating is well known to anyone who has tried Duolingo, but I rediscovered it in a new context recently. I had planned to take piano lessons this fall at the SF Community Music Center but returned from vacation to discover that I’d missed the registration deadline. When I mentioned this to someone, they pointed out that I should be able to teach myself, especially since I’ve studied other instruments. While I agreed in principle, I shifted my focus to other pursuits.

Then on a whim yesterday, I decided to try playing the chords of “Wagon Wheel” on my keyboard. I wasn’t setting out to learn how to play the piano and figured I could just follow a rhythm that roughly matched up with the strum pattern for the song on the guitar. In fact, the chord progression is a slight variation on the Axis of Awesome’s 4 chords, and my strumming pattern is just a series of quarter notes, so it’s completely straightforward.

My initial strategy was simply to voice the chords from their root notes, but it felt awkward to readjust all my fingers every time I changed chords. I revised the strategy accordingly: shift from chord to chord by readjusting as few fingers as possible. While I didn’t pen out a closed-form solution to this optimization problem, the heuristic I did find led to new chord voicings that made changing chords significantly more fluid.

Then I noticed that I was only using one hand to play, and the “Wagon Wheel” melody comprises only four notes. I started playing the chords as whole notes with my left hand and focused on the melody with my right, and pretty soon I was playing “Wagon Wheel” on the piano! The simple act of translating the song from guitar to piano taught me something about playing the piano, albeit with some habits I might need to unlearn were I to take lessons in the future.

So I repeated the process for “Dink’s Song” and then “Hey There, Delilah” while discovering how much fun it is to learn the piano this way.

## The Bug 2

[Earlier installments of The Bug]

Now that I knew about the bug, the next thing to do was tell my advisor. It had been just a few weeks since he said to a luminary visiting campus, “One of my students generalized the entropy power inequality,” to which the luminary replied, “That’s impressive!” What had led me to smile then felt embarrassing now. Had the news spread elsewhere? Would this affect my advisor’s reputation in addition to my own if we ultimately had to retract the result? Why hadn’t I noticed the issue earlier?

To reach a plausible explanation for that last question, it helps to understand how my biases played a role in checking the proof. The incorrect proof had been following a technique similar to the one in Blackman’s “The convolution inequality for entropy powers”, and while Blackman confidently swapped things like the order of derivatives and expectations, I was less confident about these steps in my proof. To justify these steps required applying some measure-theoretic results, and while I’d been exposed to these in a couple probability theory courses, this was the first time I needed to apply them in my research. As a consequence of this insecurity, I focused my attention on making sure that these parts of the argument were watertight and failed to notice that I ultimately wouldn’t be able to apply these watertight arguments to prove the result.

The actual bug came from somewhere that didn’t require any measure-theoretic sophistication. Both Blackman’s proof and mine applied Gaussian perturbations to random variables, with the difference that I had introduced an auxiliary random variable and required a Markov chain to hold. The problem for my proof was that the way I was applying the Gaussian perturbations broke the Markov structure.

I swallowed my pride, found my advisor, and explained the technical issue to him. Our discussion immediately shifted to practical matters. Could we repair the bug in time for the camera-ready deadlines of the different conferences? Part of this would depend on whether we could use the same proof technique or not. We still had a couple weeks before the first camera-ready deadline (a paper that depended on the result) and three months before the camera-ready deadline for the last one (the result itself), so the decision was to work on a patch first and retract later if necessary. We started brainstorming some attacks and looking up possible references that could help.

## The Bug

As I read through “Information theoretic inequalities”, it felt like the icing on the cake: a simpler proof of a result from my Master’s thesis. The result itself wasn’t the focus of my Master’s but instead facilitated the proof of a number of other results, so following a writeup of the initial proof and a sanity check by my advisor, I had submitted the result and its many corollaries as a series of papers to a number of conferences. Between those conference submissions and finishing my Master’s thesis, I hadn’t worried about looking for an elegant proof.

By February of 2006, the thesis had been filed, and paper acceptances had started coming in, so my focus had shifted to trying to find alternate ways to show the same result, and with that I had pulled up Dembo et al.’s “Information theoretic inequalities” among other papers of interest like Amir Dembo’s “Information inequalities and uncertainty principles”; the latter came back from Stanford with notes scrawled in blue ink on its margins, likely from Dembo himself, that read, “Ignore this part.”

My eyes had fixed upon a result in the paper showing that for real numbers $a,b,c,d$ where $d > 0$, the following two statements are equivalent:

1. $\displaystyle \exp{\frac{a}{d}} \geq \exp{\frac{b}{d}} + \exp{\frac{c}{d}}$
2. For all $0 \leq \lambda \leq 1$$\displaystyle a \geq \lambda b + (1-\lambda) c - d(\lambda \ln \lambda + (1-\lambda)\ln (1-\lambda))$

The first statement falls out of the second by setting $\lambda = \frac{1}{1 + \exp \frac{c-b}{d}}$, and the second falls from the first by showing that this choice of $\lambda$ maximizes the right side of the inequality in the second statement. The equivalence above let me rewrite my claim in a way that reduced the act of computing derivatives and showing inequalities from a multipage undertaking to a few lines, and I would be able to simplify the “walking downhill” technique I was using to show the result.

I started typing up the new proof, and when I went to apply one of the lemmas I had proven to show the result, I caught myself. Because of an operation that I was performing, one of the conditions needed to apply the lemma wouldn’t hold. Then I went back to the original proof and noticed that it suffered from the same problem. I had found a bug in the proof, and if I couldn’t resolve it quickly, I would have to retract papers and withdraw from conferences. If I couldn’t resolve it at all, a lot of the results that I had proven over the past several months would no longer hold, including one of the key ones from my Master’s thesis.

My Master’s thesis had focused on a class of problems known as CEO problems, which try to characterize fundamental limits on compressing noisy data from multiple sensors when those sensors communicate to a central estimation unit via rate-constrained links. The abstract of the paper introducing the CEO problem contains a more business-focused exposition:

A firm’s Chief Executive Officer (CEO) is interested in the data sequence $\{X(t)\}_{t=1}^{\infty}$ which cannot be observed directly, perhaps because it represents tactical decisions by a competing firm. The CEO deploys a team of $L$ agents who observe independently corrupted versions of $\{X(t)\}_{t=1}^{\infty}$. Because $\{X(t)\}$ is only one among many pressing matters to which the CEO must attend, the combined data rate at which the agents may communicate information about their observations to the CEO is limited to, say, $R$ bits per second. If the agents were permitted to confer and pool their data, then in the limit as $L \rightarrow \infty$ they usually would be able to smooth out their independent observation noises entirely. … In particular, with such data pooling $D$ can be made arbitrarily small if $R$ exceeds the entropy rate $H$ of $\{X(t)\}$. Suppose, however, that the agents are not permitted to convene, Agent $i$ having to send data based solely on his own noisy observations $\{Y_i(t)\}$. We show that then there does not exist a finite value of $R$ for which even infinitely many agents can make $D$ arbitrarily small.

The inspiration for my work had started from a paper by Yasutada Oohama, who had found the sum-rate-distortion function of the quadratic Gaussian CEO Problem.

My goal had been to extend his results to non-Gaussian sources, and while I could show that his sum-rate-distortion function was an upper bound in those cases, I had struggled to find a corresponding lower bound. My hope had been to apply his techniques to derive such a lower bound, but Oohama’s work relied on a specific property regarding the geometry of Gaussian random variables to derive the bound. Specifically, his lower bound followed by tying the orthogonality principle of conditional expectation to statistical independence, which is true for Gaussian random variables, which allowed him to use the entropy power inequality to derive a lower bound.

The proof in which I had just found a bug was an attempt to generalize the entropy power inequality to get around the fact that for non-Gaussian sources, this statistical independence condition would not hold. My attempts to fix the proof or find a counterexample would take me down the rabbit hole of related results and techniques, from Young’s inequality to the Brascamp-Lieb inequality, all because I had just realized the claim I had based part of my identity on was merely a conjecture.

## Invoking Authority

Over dinner recently, a friend mentioned that when he is unfamiliar with a topic, he tends to accept what he reads on that topic at face value. The remark sparked some thoughts I’ve been having about how the way in which information is presented can affect how people interpret it, so I shared one example that has been puzzling me.

Suppose I were to say, “Red is better than blue.” Alternatively, suppose I were to say, “I prefer red to blue.” In what sense are these two sentences different from one another?

Both of the statements are making a statement from my perspective, so one could argue that if I say, “Red is better than blue,” then I must prefer red to blue. On the other hand, based on the handful of people to whom I’ve posed these sentences, the first one is more likely to invite an argument; to argue the first, one might be able to appeal to some objective knowledge about the world that contradicts the statement, but to argue against the second, one has to know something about my preferences, and one could argue there is no greater authority on my preferences than I.

My friend’s intuitions appeared to match up with what others had said about that example, so I tried to construct an analogue that was related to a statement that isn’t concerned with the opinions of the speaker to see if a larger pattern might emerge.

Suppose I were to say, “The sky is blue.” Alternatively, suppose I were to say, “I read in an encyclopedia that the sky is blue.”

Again, the first sentence is more likely to invite an argument than the second. Taking this along with with the previous example, I wonder if this all comes down to authority. What makes me an authority on the color of the sky? On the other hand, it could be argued that I am an authority on my memories, and therefore to say, “I read in an encyclopedia that…” suggests that I did read the information. Furthermore, it could also be argued that the encyclopedia is a more reliable authority on the color of the sky than I am, and therefore the second sentence is more authoritative than the first.

I’d be curious to find out to what extent these types of sentences have been studied and whether the thoughts others have in any way line up with the intuitions I’ve detailed.

Posted in Uncategorized | 1 Comment

## Propagating Errors

Jim mentioned Puzzled Pint a few months ago, but it conflicted with an improv rehearsal. However, this month, we decided to make Puzzled Pint our troupe’s social outing.

The first step in Puzzled Pint is to figure out where the event will be held. To do this, one solves a location puzzle. In our case, we needed to traverse a maze, and the puzzle was sensitive in the sense that early mistakes propagate to steps later on. It took a while to realize that we had missed a clue, but once it was resolved, we found ourselves on our way.

What’s interesting about a puzzle with this type of propagating error is that it becomes fairly simple to detect that an error was made since the solution is typically an English word or phrase, but not necessarily obvious what the error was, akin to losing an inter frame in video and seeing an error propagate across subsequent frames until the start of the next GOP. The difference between the puzzle and video coding is that given a codec, there is a well understood way to decode the video frames, whereas in the puzzle, the decoding mechanism is intentionally obfuscated and to be deduced along with the solution.

This makes a mistake in a puzzle all the more challenging to debug. For instance, in an early attempt at the February location puzzle, I decoded ROTATERETAMETOYONE, so I was confident that my method was on the right track because the prefix ROTATE is a word, but it wasn’t necessarily clear whether I was applying a correct method and had simply overlooked a piece in the process, or whether the method I was applying was too simplistic to solve the puzzle, and there was something more complex occurring that needed to be applied before moving to the next word.

Ultimately, someone figured out what was missing, and we had a mechanism to validate that the solution we had arrived at was correct. That leads to a second difference between video coding and puzzle solving; namely, it’s sometimes possible to arrive at and validate a solution by sidestepping the expected method the puzzle wants one to apply. For instance, if one gets enough clues that the entropy of the solution is sufficiently low, taking advantage of one’s knowledge of English can provide an alternate mechanism to arrive at the solution instead of figuring out the intended method. In fact, in some cases, one might reverse-engineer the remaining pieces of the method after discovering the solution.