Author Archives: Noah Stephens-Davidowitz

A mysterious constant called pi, arising from the Gaussian integral (with a minor application to circles)

Hi, nerd blog! (This is a post that I wrote a long time ago and then never published. I figured it would be nice to publish it on March 14th.)

Today, we’re interested in the Gaussian integral

    \[f(C) := \int_{-\infty}^\infty e^{-Cx^2} {\rm d} x\]

for C > 0. This integral of course has lots of very serious practical applications, as it arises naturally in the study of the Gaussian/normal distribution. But, more importantly, it’s a lot of fun to play with and is simply beautiful. We’ll see a bit about what it makes it so pretty below. We start by simply trying to figure out the value of this thing, which isn’t super easy.

By a change of variables, we immediately see that f(C) is proportional to 1/\sqrt{C}. But, what is the constant of proportionality? It’s actually nicer to ask a slightly different question: what is the unique value of C such that f(C) = 1. A quick numerical computation shows that f(3.14159) \approx 1. E.g., here’s some Mathematica code to find this value:
.

This constant C \approx 3.14159 is so important for this blog post that it is worth giving it a name. So, I looked through the Greek alphabet for a nice letter that doesn’t get used much and chose the obscure lowercase letter \pi—spelled pi in English, and pronounced like “pie”. In other words, by definition f(\pi) = 1. (If this implicit definition bothers you, we can equivalently just define \pi := f(1)^{2}. But, I find the implicit definition to be more elegant.)

So, we have this brand new mysterious constant \pi. What should we do with it? It is of course natural to try to find different expressions for it (though our integral expression can already be used to compute it to quite high precision). A first idea is to apply the change of variables u = \pi x^2 to obtain

    \[1 = 2\int_{0}^{\infty} e^{-\pi x^2}{\rm d} x = \pi^{-1/2} \int_0^{\infty} e^{-u}/u^{1/2} {\rm d} u\; .\]

So,

    \[\pi =\Big( \int_0^\infty e^{-u}/u^{1/2} {\rm d} u\Big)^2\; ,\]

which you might recognize as the square of the Gamma function evaluated at 1/2, i.e., \pi = \Gamma(1/2)^2. (Recalling that \Gamma(n) = (n-1)! for integer n, one might interpret this as saying that \sqrt{\pi} is “the factorial of -1/2.”)

This mysterious identity will play a key role later. We could, of course, find other identities involving this new constant \pi. But, I thought instead I’d jump ahead to a rather obscure fact about the relationship between \pi and a circle.

Our constant’s relationship with circles

In my opinion, the Gaussian distribution is far more interesting in dimensions larger than one. In particular, consider the distribution on \mathbb{R}^n given by the probability density function

    \[\Pr[\mathbf{x}] = e^{-\pi \|\mathbf{x}\|^2}\; .\]

Notice that

    \[\int_{\mathbb{R}^n}e^{-\pi \|\mathbf{x}\|^2} {\rm d} \mathbf{x} = \int_{-\infty}^{\infty} \cdots \int_{-\infty}^{\infty} e^{-\pi x_1^2 -\cdots - \pi x_n^2} {\rm d}x_1 \ldots {\rm d} x_n = 1\; ,\]

so that this is in fact a distribution.

In fact, up to scaling, this distribution is the unique continuous radial product distribution—i.e., the unique distribution such that \Pr[\mathbf{x}] can be written both as a function only of the norm of \mathbf{x}, \Pr[\mathbf{x}] = f^*(\|\mathbf{x}\|) for some continuous function f^*, and as a product of functions of its coordinates, \Pr[\mathbf{x}] = f_1(x_1)\cdots f_n(x_n). This makes the Gaussian a uniquely powerful tool for reducing complicated multi-dimensional problems to one-dimensional problems.

For example, suppose that for some strange reason we wish to know the circumference of a circle with radius one. (If we were less civilized mathematicians, we might instead set the diameter to be equal to 1, so that the radius would be 1/2.) We can try to write this as some kind of path integral or something—and suffer quite a bit in the process—or we can use the following beautiful trick. We can write

    \[1 = \int_{\mathbb{R}^2} e^{-\pi \|x\|^2} {\rm d} x = \int_0^\infty e^{-\pi r^2} \sigma_r {\rm d} r= \sigma_1 \int_0^\infty e^{-\pi r^2} r {\rm d} r\;,\]


where \sigma_r is the circumference of a circle of with radius r. (The only facts that we have used here are our definition of \pi together with the fact that \sigma_r = r \sigma_1.) Fortunately, the last integral is easy to compute as

    \[\int_0^\infty e^{-\pi r^2} r {\rm d} r = \frac{1}{2\pi} \cdot \int_0^\infty e^{-u} {\rm d} u = \frac{1}{2\pi} \;.\]

Rearranging, we see that \sigma_1 = 2\pi!

So, surprisingly, our mysterious constant \pi is actually intimately related with the circumference of a circle. (If we were less civilized mathematicians, we might even have simply defined \pi to be the circumference of a circle with radius 1/2.)

What’s so special about two dimensions? Surface area of n-spheres.

But, why stop in dimension 2? This same one neat trick is just as useful in higher dimensions. E.g., what is the surface area \sigma_1^{(n-1)} of a unit sphere in n dimensions? (Conventionally, we write the n-sphere as S^{n-1} because it as an (n-1)-dimensional object that happens to be embedded in n-dimensional space. This is why I write \sigma^{(n-1)}_1 for its surface area.) Well, we have

    \[1 = \int_{-\mathbb{R}^n} e^{-\pi \|x\|^2} {\rm d} x = \int_0^\infty e^{-\pi r^2} \sigma^{(n-1)}_r {\rm d} r= \sigma^{(n-1)}_1 \int_0^\infty e^{-\pi r^2} r^{n-1} {\rm d} r\; .\]

(Again, the only property that I have used here is that \sigma_r^{(n-1)} = r^{n-1} \sigma_1^{(n-1)}.) This integral is a bit less pretty, but using the same approach, we see that

    \[\int_0^\infty e^{-\pi r^2} r^{n-1} {\rm d} r = \frac{1}{2\pi^{n/2}} \cdot \int_0^{\infty} e^{-u} u^{n/2-1} {\rm d} u = \frac{\Gamma(n/2)}{2\pi^{n/2}}\; ,\]

where the last step is literally just plugging in the definition of the Gamma function. Rearranging, we see that the surface area of the unit sphere in n dimensions is exactly \frac{2\pi^{n/2}}{\Gamma(n/2)}.

If the Gamma function intimidates you, that’s fine. (It certainly intimidates me.) We can go a bit further by remembering that for integers m, \Gamma(m) = (m-1)!, while \Gamma(m+1/2) = \sqrt{\pi}(2m)!/(4^m m!). (Both of these identities follow from the relation \Gamma(x) = (x-1) \Gamma(x-1), which follows from integration by parts, together with the values \Gamma(1) = 1 and \Gamma(1/2) = \sqrt{\pi}.)

Then, we see that the surface area of the unit sphere in n dimensions is

    \[\sigma_1^{(n-1)} = \begin{cases}\pi^{n/2} \cdot \frac{2}{(n/2-1)!} &n\text{ even}\\\pi^{(n-1)/2} \cdot \frac{2^n \cdot ((n-1)/2)!}{(n-1)!} &n\text{ odd}.\end{cases}\]

In particular, from this formula, we immediately see the perhaps surprising fact that the surface area of the sphere in n dimensions rapidly approaches 0 as n \to \infty. (I.e., “n-dimensional unit spheres are tiny.”) We also see the rather strange fact that \sigma_1^{(n-1)} is a rational multiple of \pi^{\lfloor n/2 \rfloor}.

We can also plug in low values of n to see what we get. E.g., I have heard that some people are interested in the case when n = 2 and n = 3. Plugging in, one sees that the circumference of a circle with radius one is 2\pi (which, ok, we already saw before), and that the surface area of a sphere with radius one is 4\pi. But, we can easily go farther: the surface area in four dimensions is 2\pi^2, and in five dimensions, it is 8\pi^{2}/3.

And, we can of course find the volume of the unit n-ball by computing a simple integral

    \[V^{(n)} = \int_0^{1}\sigma_r^{(n)} {\rm} d r = \sigma_1^{(n)} \cdot \int_0^{1} r^{n-1} {\rm d} r = \sigma_1^{(n)}/n \; .\]

In short, I think this mysterious constant \pi is rather nice. Perhaps it will find other applications.

String indexing and compression with the FM-Index

I’m currently interning at Seven Bridges Genomics for a few weeks, which is awesome. They asked me to give a talk on the FM-Index–a highly compressed data structure that allows for efficient search queries. This is used for a fundamental step in the sequencing a genome, called alignment, in which you take a bunch of strings that are about 100 characters long that the chemistry spits out and try to fit them together into one long meaningful genome. (For more about alignment, check out this awesome blog post by Nate Meyvis.)

Some SBG people wanted me to share my slides, but the slides that I typically make are completely unsharable. So, I made a video, and now my lucky blog readers get the opportunity to listen to me talk for 40 minutes about the FM-Index! (You know… if you’re into that sort of thing…) Here’s the link.

(You should probably watch it in 720p or 1080p because otherwise some of the text is annoyingly blurry for some reason.)

A Little Security Flaw in Lots of Places

Hi, nerd blog. I’m not a person who cares much about privacy (which makes the fact that I’m currently a PhD student studying cryptography a little weird). But, there are some relatively basic levels of privacy that I think people are clearly entitled to.

For example, suppose that I want to know whether or not someone has an account on a certain website, and I don’t think he’d want to tell me. It’s not hard to imagine an example where that information could be harmful. The site might be for online dating or porn, or it might be somehow related to a counterculture, a protest movement, etc. The person might be someone that I’m angry at, a romantic interest, someone that I’m considering hiring, an ex, etc. Anyone who’s gone to middle school should recognize that people will certainly look for this information if they know where to find it.

It turns out that on many websites, you can do this–including extremely large websites. I’ll use economist.com as my first example because it’s a non-embarrassing website that I have an account on that has this flaw, as shown below.

Here’s what happens when you try to log in to economist.com with a gibberish e-mail address:

economistwrongemail

Here’s what happens when I use my e-mail address and gibberish for a password:

economistcorrectemail

So, you now all know that I have an account on this website, and if I want to know whether or not someone else does, all I need is her e-mail address.

Facebook also makes this mistake. Here are my Facebook privacy settings, in which I quite unambiguously say that you should need to be friends with one of my friends to find my Facebook account using my e-mail address:

facebookprivacysettings

Here’s what happens when you simply enter my e-mail address into Facebook’s login with a gibberish password:

facebookbadlogin

On Facebook, this actually also works with phone numbers, and either way, they conveniently show an image of me and my name, in addition to verifying that I have an account.

Now, that’s just stupid, and it shouldn’t happen. The way to fix this is incredibly obvious: On a failed login, just say “Sorry, that e-mail/password combination is wrong.” instead of telling users specifically which credential is wrong. This has been accepted best practice for a long time, and it’s amazing to see such large companies screw it up. However, it’s hard not to roll your eyes because, well, most people would not be embarrassed if someone who already had their e-mail address also knew whether or not they subscribed to The Economist or have a Facebook account.

So, let’s pick something more sensitive. Say, for example, pornography. According to Wikipedia, Pornhub is one of the largest porn sites, and the 81st most popular website on the entire internet according to Alexa rankings (which likely undercounts such sites). It’s fairly obvious why someone wouldn’t want anyone with his e-mail address to know whether or not he has a Pornhub account. While Pornhub does not fall to the exact attack that I describe, it’s a nice example of a slightly different attack that accomplishes the same thing. Here’s what happens when you enter an invalid e-mail address into Pornhub’s “Forgot your password” page:

pornhubforgotpassword

So, want to know whether or not someone has a Pornhub account? Go to the website, click “Forgot your  password,” and enter his e-mail address. (Of course, if he does have an account, he’ll get an e-mail. He might ignore it, but if he’s aware of this attack, he might recognize that someone did this to him. He won’t know who, though, and you’ll still have the information.)

In spite of the fact that this has been known for a long time, I’m willing to bet that the second attack works on the vast majority of sites on the web. I’ll spare more screenshots and naming-and-shaming, but a quick check shows that it works on lots of other porn sites, lots of dating sites, lots of online forums about sensitive topics, sites for companies devoted to web security, and, in general, just about every site I checked. (Sometimes, you have to use the “Forgot your username” link instead of “Forgot your password.”)

The solution? Obviously, just say “If you’ve entered the correct address, you will receive an e-mail shortly. If you don’t, please try a different address.” Fortunately, one site that screwed a lot of stuff up actually got this right: Healthcare.gov.

The Bootstrap Species

Double disclaimer:

  1. This is gonna be a rambling rant, as usual.
  2. I believe in evolution, not intelligent design. But, part of what’s really cool about evolution is that it creates things that look intelligently designed. It’s therefore very convenient to talk about a species as though it was built to some deity’s specifications. So, I’m gonna make liberal use of that metaphor. Feel free to replace evocative, convenient, and technically incorrect phrases like “We were designed to survive” with the more boring, stilted, and accurate “We naturally happen to be ridiculously good at surviving because if we weren’t better at it than most of our innumerable competitors we wouldn’t exist” if you feel like it.

When I walk into a pharmacy (and, as a typical New Yorker, I buy a remarkable amount of my stuff at pharmacies), I’m often struck by the ridiculous number of things that we have to cure our ailments.

Mostly, it’s a reminder of the amazing variety of ways in which we break: Our bodies crack, leak, buckle, and bleed. Things grow on our bodies and in them. They eat us. Our organs stop functioning or function too quickly or too slowly. We itch. Even our brains–the things that are supposed to be us–often behave in ways that we wish they didn’t. Pharmacies have all sorts of things to cure many of these ailments and to make many more of them more tolerable.

Even stranger, we’re often troubled when things function exactly as they should. Our faces and bodies sprout hair; our nails grow long; sex leads to pregnancy. The pharmacy has solutions to all of these problems too–problems caused by a perfectly healthy body doing what it was designed to do.

There are even solutions to the problems caused by our solutions to other problems.

And, of course, things are pretty awesome as a result. We’ve slowly developed methods (some complicated but many amazingly simple) to tinker with these extremely complex machines that we live in–that we are–to get them to do what we want. We’ve built sprawling pharmacies, and we live long lives with amazing consistency. We’re happier and healthier; we even look better.

But, that is exactly what we’re doing: We’re tinkering with our bodies, and it’s fascinating. We’re now using the most sophisticated machines that we’ve ever encountered to do tasks for which they were never designed. And it’s working! More specifically, we pursue not just survival, not just reproduction, but happiness.

Continue reading

Measuring the Difficulty of a Question (What Complexity Theory Is)

Suppose you’re given some clear question with a well-defined answer. Computer scientists often like to consider number-theoretic questions, such as “Does 19 divide 1547 evenly?”

It seems quite natural to try to assign to this problem some sort of measure of difficulty. For example, from an intuitive perspective, it’s easier to figure out whether 19 divides 77 than whether it divides 1547. And, figuring out whether two divides 1548 is totally trivial. And, of course, there are much harder problems, like counting all the prime numbers below 1548 .

Some problems’ difficulties take a second to figure out. For example, is it harder to figure out what the square root of 15,129 is or to count the squares less than 15,130? Intuitively, it might seem like the second one should be harder, but really, the answer to the second question is the same as the answer to the first–15,129 is 123^2, and there are 123 squares less than 15,130 (not counting zero), one for each positive integer less than or equal to 123. So the first problem is exactly as difficult as the second.

So, some problems’ difficulties are intuitively comparable. Some are still clearly comparable, but realizing that isn’t completely trivial. But, some problems just seem terribly unrelated. For example, is it easier to count the primes less than 78 or to tell whether 1547 is prime? Is it easier to count the squares less than 123,456 or count the primes less than 78? For questions like these, it seems likely that the actual answers actually depend on our definition of difficulty.

So, we need a rigorous definition of difficulty. The field that studies such definitions and categorizes problems according to their difficulty is called complexity theory, and it’s one of my favorite branches of mathematics. What follows is an NSD-style introduction to the field:

[toc]

A Me-Based Notion of Difficulty

Continue reading

What’s Cancer?

There’s a nice laymen’s explanation for how most types of diseases work. Viruses, for example, are little pods that inject DNA or RNA into your cells, thus replicating themselves. (Here’s an awesome video by Robert Krulwich and David Bolinsky illustrating how a flu virus works.) Similarly, bacteria, fungi, and various other parasites are independent organisms–made of cells just like us–and its their process of going about their lives inside of us–eating various things (sometimes parts of us), excreting, releasing various chemicals, etc–that makes us sick (or gives us rashes or helps us digest food or boosts our immune system, etc). These various pathogens have evolved to kill. Autoimmune diseases are incredibly complicated (and often poorly understood), but the rough idea is clear: We have cells in our body that are designed to kill off viruses, bacteria, fungi, etc., and sometimes they screw up and kill the good guys instead of the bad guys.

All of those simplified descriptions make perfect sense, and as I’ve become more of a medical nerd, they’ve essentially held up to scrutiny. In other words, they’re good models.

However, the pop-culture description(s) of cancer never really satisfied me. In fact, I would argue that most descriptions of cancer actually ignore the basic question of what the hell this thing is and why it exists. Even my beloved Wikipedia neglects to just come out and say what the damn thing is in its main article about it, though its article on carcinogenesis is pretty good. That type of thing gets on my nerves–I don’t really understand how people can be comfortable talking about something without knowing what it is.

So, until recently, I knew embarrassingly little about cancer. Obviously, I understood that cells mutated and divided rapidly to form cancer, but what made them do that? And why are such a huge variety of things implicated in causing cancer? Radiation, cigarette smoke, HPV, and asbestos are very different things, so how can they all cause the same type of disease? What makes these evil mutant cells so damn deadly? What the hell is metastasis, and why does it happen? I did a decent amount of research and learned some stuff, but I still didn’t really understand what cancer actually is.

That changed thanks to the awesome NPR radio show/podcast Radiolab (which you should absolutely check out if you haven’t already–especially the archives). Their episode on cancer, “Famous Tumors,”finally provided me with a description of cancer that made any sense. It’s still the only decent layman’s explanation of what cancer actually is that I’ve been able to find (and I did a lot of searching while I was writing this blog post up). Once I understood the basics, I found that my other research on the subject actually made sense–even though almost all of it neglected to define the concept in question.

Anyway, you should probably just listen to “Famous Tumors” right now. Seriously. The part that led to my epiphany is a very brief segment that starts around minute eleven or so, and (as is Radiolab’s style), it’s described quite beautifully.

However, for those of you who’d rather read my ramblings, I’ll provide my own description below. In keeping with my tradition of verbosity, this post will take a while to actually mention cancer directly, but I swear the whole thing’s about cancer. Bear with me.

The story starts with a description of what we are:

[toc]

Evolution and Life as a Cell in a Multicellular Organism

Continue reading

A Better Idea than Signature by Squiggle

Like most of you, I sign stuff a lot.

Like many of you, I’m usually a bit embarrassed when I do it. My signature has devolved from a relatively legible cursive “Noah Stephens-Davidowitz” when I was in high school to my college “Noah S-D” to my post-graduation “N S-D” to my current series of four squiggles, which one might be persuaded are loosely derived from my initials together with a hyphen.

My business partner, Thomas, told me (I think only half-jokingly) that it was unacceptable for signing contracts with our clients. He accused me of just writing the number 900 lazily and sloppily. (I personally think it typically looks more like 907, but that day, I concede, it looked pretty 900ish.) Even people delivering food to my apartment, who only ask for my signature to protect themselves in the unlikely event that I later claim to have not received my food, have asked me to confirm that my signature is in fact a signature.

I would post an image of it for my readership to laugh at, but I suppose that that would be a bit of a security vulnerability.

But, isn’t that incredibly silly? I have in my mental possession four vaguely defined squiggles. For some reason, I’m forced to show people my squiggles all the time as some sort of confirmation that I, Noah Stephens-Davidowitz, am agreeing to something . And, it’s not just with delivery guys; I use my squiggles for extremely important interaction with governments, clients, my bank, etc. But, in spite of the fact that I show this thing all the time, I’m also keenly aware of the fact that posting it publicly on the internet is a terrible idea.

All of this is in case I at some point I say  “No, I never agreed to that.” Because of my signature, a slick lawyer could then confidently respond “But, if you didn’t agree to that, then why are these four squiggles here? Who but you could have squiggled four times on this piece of paper in such a way?”

Touche, slick lawyer.

Continue reading

What Color Is

Disclaimer: I sorta figured this stuff out on my own. I did a tiny bit of research for this post, but I’m not very well-read at all on this particular field. So, there’s a decent chance I got something wrong. Oops… Please don’t take what I write here as gospel, and please let me know what I screwed up in the comments.

When I was a kid, I was taught some really confusing and vague things about color. There were the Official Colors of the Rainbow–red, orange, yellow, green, blue, indigo, violet (Roy G. Biv). Three colors were always called primary colors, but there seemed to be disagreement about whether the third one was yellow or green. (Yellow seemed like the obvious choice, though, since green’s just a mixture of blue and yellow. Right? How would you even make yellow from red, green, and blue?) There were various formulas for combining colors together to get other colors when I was painting in art class (E.g., red and green makes brown). But then there were different rules for mixing different colors of light together–like how white light is apparently a mixture of all other colors of light or something. There was also that weird line about why the sky was blue, which seemed to me to be equivalent to “The sky is blue because it’s blue.”

All of this left me very confused, with a lot of unanswered questions: Why isn’t brown a color of the rainbow? Sure, it’s a mixture of other colors, but so are green and orange and purple and whatever the hell indigo is. If orange isn’t a primary color, does that mean that everything that’s orange is really just red and yellow? And why is violet all the way on the other side of the rainbow from red even though it’s just a mixture of red with blue?

At the time, I was too shy to ask because people seemed so incredibly not bothered by all of this. I assumed I must have been missing something obvious.

As it happens, what I was missing isn’t at all obvious, but it’s also pretty easy to explain. (In particular, kids should be taught this.) All of this confusion goes away when you realize what color is and how our eyes and brains measure it. I’ll give a characteristically wordy explanation:

Light, energy, etc.

Continue reading

The Nutrition Google Game

(Click here to play the game yourself.)

A friend and I just accidentally invented a fun game. I like it because it cutely illustrates the current completely ridiculous state of our understanding of human nutrition. Namely, there’s an amazingly long list of things that people think about nutrition–such a long list that it’s just really really hard to imagine even a small fraction of these things actually being true.

(I have a lot more to say about human nutrition than just this. In short, I am extremely skeptical of any claims about human nutrition. I sort of touched on this in this old blog post on my poker blog, and I might do a bit more in the future.)

Anyway, here’s the game:

  1. Choose something that people eat that some people think might have effects on your health (e.g. trans fats or calcium or acai berries).
  2. Choose a word or phrase that is somehow related to human health (e.g. “asthma” or “concentration” or “bad for you”).
  3. Google them together.
  4. Watch as hilarity ensues.