Purifying spoiled randomness with spoiled randomness

Recently a preprint was posted to ECCC, with the pulse-quickening title “Explicit Two-Source Extractors and Resilient Functions“. If the result is correct, then it really is — shall I say it — a breakthrough in theoretical computer science. One of my favorite things about TCS is how fresh it is. It seems like every other week, an exciting result in all sorts of subareas are announced on arxiv or ECCC. I thought I’d use this as an opportunity to (briefly) explain what this breakthrough is about, for those who aren’t theoretical computer scientists.

Computer programs need sequences of random numbers all the time. Encryption, games, scientific applications need it. But these applications usually assume that they have access to completely unpredictable random number sequences that have no internal patterns or regularities.

But where in the world would we find such perfect, pristine sources of random numbers? The methods that people devise to generate sources of randomness is a fascinating subject in its own right, but long story short, it’s actually really, really hard to get our hands on such good random numbers. Pure randomness, as it turns out, is a valuable resource that is as rare as — perhaps even rarer than — gold or diamond. Continue reading

TCS in the Big Apple – STOC 2014 Recaps (Part 1)

courtesy of frontpagemag.comAh, summertime. For many NotSoGITCSers, it means no classes, warm weather, ample research time… and a trip to STOC. This year the conference was held in New York City, only four hours away from Cambridge via sketchy Chinatown tour bus. Continuing in the great tradition of conference blogging by TCS blogs (see here, and here), NotSoGITCS will share some of our favorite papers from the conference. I’ll start this off with my recap on Rothvoss’s paper on extension complexity. Over the next few posts, we’ll hear from Clément Canonne and Gautam Kamath on Efficient Distribution Estimation, Pritish Kamath on the latest results in algebraic circuit complexity, Justin Holmgren on indistinguishability obfuscation, and Sunoo Park on coin flipping!

Henry Yuen on The matching polytope has exponential extension complexity, by Thomas Rothvoss

The Best Paper award for this year’s STOC went to this paper, which is another milestone result in the recent flurry of activity in lower bounds for extension complexity of combinatorial problems. This area represents a fruitful “meeting in the middle” between complexity theory and algorithms. Although a proof of P ≠ NP — a conjecture about the limitation of all polynomial time algorithms — seems astronomically distant, we can instead shoot for a closer goal: show that polynomial time algorithms we know of now cannot solve NP-complete problems. The area of extension complexity is about understanding the limitations of some of the finest weapons in our polynomial-time toolbox: LP and SDP solvers. As many of you know, these algorithms are tremendously powerful and versatile, in both theory and practice alike.

A couple years ago, STOC 2012 (also held in New York!) saw the breakthrough result of Fiorini, et al. titled Linear vs. Semidefinite Extended Formulations: Exponential Separation and Strong Lower Bounds. This paper showed that while powerful, LP solvers cannot naturally solve NP-complete problems in polynomial time. What do I mean by naturally solve? Let’s take the Traveling Salesman Problem for a moment (which was the subject of the aforementioned paper). One way to try to solve TSP on an n-city instance using linear programming is to try to optimize over the polytope PTSP where each vertex corresponds to a tour of the complete graph on n nodes. Turns out, this polytope has exponentially many facets, and so is a rather unwieldy object to deal with — simply writing down a system of linear inequalities defining PTSP takes exponential time! Instead, people would like to optimize over another polytope in higher dimension QTSP that projects down to PTSP (such a QTSP would be called a linear extension of PTSP) except now QTSP has a polynomial number of facets. Does such a succinct linear extension exist? If so, that would imply P = NP! The Fiorini, et al. paper showed that one can’t solve the Traveling Salesman Problem this way: the TSP polytope has exponential extension complexity — meaning that any linear extension of QTSP of PTSP necessarily has exponentially many facets [1].

In a sense, one shouldn’t be too surprised by this result: after all, we all know that P ≠ NP, so of course the TSP polytope has exponential extension complexity. But we’ve learned more than just that LP solvers can’t solve NP-complete problems: we’ve gained some insight into the why. However, after the Fiorini et al. result, the natural question remained: is this exponential complexity of polytopes about NP-hardness, or is it something else? In particular, what is the extension complexity of the perfect matching polytope PPM (the convex hull of all perfect matchings in K_n)? This is interesting question for a number of reasons: this polytope, like PTSP, has an exponential number of facets. However, unlike the TSP polytope, optimizing linear objective functions over PPM is actually polynomial-time solvable! This is due to the seminal work of Jack Edmonds, who, in the 1960’s, effectively brought the notion of “polynomial time computability” into the (computer science) public consciousness by demonstrating that the maximum matching problem admits an efficient solution.

As you might have guessed from the title, Thomas Rothvoss showed that actually the matching polytope cannot be described as the projection of a polynomially-faceted polytope! So, while we have LP-based algorithms for optimizing over PPM, these algorithms must nontrivially rely on the structure of the matching problem, and cannot be expressed as generically optimizing over some succinct linear extension. I’ll say a few words about how he shows this. To lower bound the extension complexity of a polytope, one starts by leveraging the Yannakakis’s Factorization Theorem, proved in his 1991 paper that started the whole area of extension complexity. His theorem says that, for a polytope P, \mathrm{xc}(P) = \mathrm{rank}_{+}(S_P), where the left-hand side denotes the extension complexity of P [2], and the right-hand side denotes the non-negative rank of the slack matrix of P [3]. Now, instead of worrying about every possible linear extension of P, we “only” have to focus our attention on the non-negative rank of S_P. It turns out that fortuitously, we have techniques of lower bounding the non-negative rank of matrices from communication complexity (a connection that was also pointed out by Yannakakis). Roughly speaking, communication complexity tells us that the non-negative rank of a matrix is high if the matrix cannot be “described” as a small collection of combinatorial rectangles.

This idea is captured in what Rothvoss calls the “Hyperplane separation lower bound”: let S be an arbitrary non-negative m\times n matrix; and W be an arbitrary matrix of the same dimensions (not necessarily non-negative). Then \mathrm{rank}_+(S) \geq \langle W, S \rangle/ (\|S \|_\infty \alpha_W), where  \langle W, S \rangle is the Frobenius inner product between W and S, \|S \|_\infty is the largest-magnitude entry of S, and \alpha_W := \max \{ \langle W, R \rangle : R \in \{0,1\}^{m \times n}\text{ is rank 1} \}. Intuitively, in order to show that the non-negative rank of S is large, you want to exhibit a matrix W (a “hyperplane”) that has large correlation with S, but has small correlation with any rectangle R. Rothvoss presents such a hyperlane W that proves the matching polytope has exponential extension complexity; the hard part is showing that \langle W, R \rangle \leq 2^{-\Omega(n)} for all rectangles R. To do so, Rothvoss follows a similar strategy to Razborov’s proof of the \Omega(n) lower bound on the randomized communication complexity of DISJOINTNESS, with substantial modifications to fit the matching problem.

There are a number of reasons why I like these papers in extension complexity: they’re concrete, solid steps towards the dream of proving P ≠ NP. They’re short and elegantly written: the Fiorini, et al. paper is 24 pages; the Rothvoss paper is less than 20. They also demonstrate the unity of theoretical computer science: drawing upon ideas from algorithms, combinatorics, communication complexity, and even quantum information.

-Henry Yuen

[1] Well, now we know that these guys couldn’t have used a succinct extended formulation.

[2] \mathrm{xc}(P) is defined to be the minimum number of facets of any polytope that linearly projects to P.

[3] Let S be a non-negative matrix of size m\times n. Then \mathrm{rank}_+(S) = k if there exists a factorization S = TU where T and U are non-negative matrices, and T and U have dimensions m\times k and k\times n, respectively. Let P = \{x : Ax \leq b \} is a polytope defined by m inequalities, with vertices v_1,\ldots,v_n. Then the (i,j)th entry of the slack matrix S_P of P is defined to be b_j - A_j v_i, with A_j the jth row of A.

Can you tell if a bit is random?

Avi Wigderson has this wonderful talk about the power and limitations of randomness in computation, surveying the beautiful area of randomized algorithms, and the surprising connection between pseudorandomness and computational intractability. If you haven’t seen it already, I highly recommend it!

There is a part of the talk when Avi fearlessly engages the age-old question of the meaning of randomness. What does it mean for something to be random? We theoretical computer scientists often speak of “coin tosses”: many of our algorithms toss coins to decide what to do, and Alice and Bob toss coins to foil their mortal enemy Eve. Avi asks, “Is a coin toss random?”

Imagine that I hold in my hand a (fair) coin, and a second after I toss it high in the air, you, as you are watching me, are supposed to guess the outcome when it lands on the floor. What is the probability that you will guess correctly? 50-50 you say? I agree!

(Reproduced from the essay version of his talk). But now, Avi says, imagine that the coin toss was repeated, but this time you had some help: you were equipped with a powerful computing device, and accurate sensors, like an array of high speed video recorders. The coin suddenly starts to look a lot less random now! If your computer and sensors were good enough, you’d likely be able to predict the outcome of the coin toss nearly perfectly. In both cases the coin toss was the same, but the observer changed, and so did the amount of randomness in the situation. Thus, he contends, randomness is in the eye of the beholder.

From here, Avi launches into an excellent discussion of pseudorandomness and other topics. However, I’m going to forge onwards — foolishly, perhaps — with the original theme: what is randomness? As he pointed out, the randomness in the coin toss situation depended on the capabilities of the observer. Thus, one could argue that a coin toss is not truly random, because in principle you could predict the outcome. Is there a situation that is intrinsically random, no matter how powerful or omniscient the observer was? Put another way: is there a process by which I can generate a bit that  no external super-being could possibly predict? Can you tell if a bit is random?

Enter the physics

Do you believe that the universe is governed by quantum mechanics? If so, then it seems that the matter is settled: quantum mechanics guarantees the existence of intrinsically random events. For those who know basic quantum, generating a truly random bit is easy: simply prepare a qubit state \frac{1}{\sqrt{2}}(|0\rangle + |1 \rangle) and measure in the computational basis — done! In fact, there are hardware random number generators that operate on this principle

But how do you actually prepare such a qubit, and perform the measurement? If you were a skilled experimentalist with access to perfectly manufactured mirrors and laser sources and detectors, then maybe. But then, how would you know these mirrors and lasers and detectors were built correctly?

For the rest of us down on Earth who don’t have time to learn experimental quantum physics and personally guarantee that all our laboratory equipment was built to specification, we’d like to be able to take some apparatus like a hardware random number generator and test whether its outputs were genuinely random — without having to open up the device and inspect its insides. Can we test randomness in a black box fashion?

It is easy to see that, if you only had input/output access to a black box device, there’s no test you can perform to reliably tell whether the device’s outputs are random. Intuitively, this is because for every test, you can construct a black box device that deterministically produces outputs that pass the test! Here’s a proof-by-Dilbert-cartoon:

What about statistical tests for randomness? Pseudorandom number generators are usually subject to an extensive battery of tests that check for suspicious patterns or regularities. Again, these are not fool-proof: one can design a fixed string that “diagonalizes” against all these heuristics and passes them all!

The power of non-locality

Black-box randomness testing seems like a hopeless task. But it turns out that, with a tiny extra assumption, it becomes possible! The added ingredient is non-locality.

Imagine that we had a hardware random number generator (HRNG) that we wanted to test, and suppose that the HRNG consisted of two separate pieces — call them Piece A and Piece B. Prevent A and B from communicating, either by surrounding them with Faraday cages or putting them on opposite sides of the galaxy. Now play the following game with them: generate a random bits xy, and give x to A and y to B. Collect output bits a from A and b from B, and check if = Λ y (where b indicates the parity of the two bits). If so, the devices win this curious little game.

Suppose A and B were completely deterministic devices: their outputs a and b were fixed deterministic functions of their inputs x and y, respectively. Then it is not hard to see that, over the random choices of x and y, they can only win this game with probability 3/4.

What if you somehow knew that the devices were winning with probability greater than 3/4? What must you conclude? These devices — and hence their output bits — must be acting randomly. The killer fact is, it’s actually possible to win this game with probability strictly greater than 3/4! This can happen if part A and part B, though isolated, utilize something called quantum entanglement — a phenomenon that Einstein derisively termed “spooky action at a distance”.

We’ve made some headway on randomness testing, it seems. First, this game I’ve described does not require you to peer inside the HRNG. Second, if the HRNG wins the game with high enough probability, you’ve certified that its outputs must contain some randomness. Third, quantum mechanics says it is possible to build a HRNG to win the game. Finally, as a bonus: you don’t need to believe in quantum mechanics in order to trust the conclusion that “If the devices win with probability greater than 3/4, they must be producing randomness”!

What I’ve described to you is a distillation of the famous Bell’s Theorem from physics, which says that quantum mechanics is inconsistent with any so-called “hidden variable” theory of nature that respects locality (i.e. you can separate systems by distances so they cannot communicate or influence each other). This is why people say that quantum mechanics exhibits “non-local” behavior, because games like the one above can be won with probability greater than 3/4 (which has been demonstrated experimentally). Bell’s Theorem has been historically important in establishing a dividing line between classical and quantum theories of nature; but today we have a new interpretation: one can view Bell’s Theorem — published half a century ago — as saying that we can meaningfully perform randomness testing.

Randomness expansion — to infinity and beyond!

“Wait a minute,” you say. “While you’ve presented a test for randomness, this isn’t so useful — to even run the test you need 2 bits of randomness to generate x and y! Why test a HRNG if you already have randomness to begin with?”

True. A HRNG test isn’t very useful when it requires more randomness than is produced. It’s like the current state of fusion reactors: they produce less energy than is required to run them. But researchers in quantum cryptography realized that you could get a randomness efficient test, using less randomness than the amount that is produced by the HRNG. For example, one can start with 100 random bits, run one of these randomness efficient tests, and end up with, say, 200 random bits, a net gain of 100! In a sense, the amount of randomness you now possess has been expanded — hence these tests are called randomness expansion protocols.

Roger Colbeck, in his Ph.D. thesis from 2006, came up with a protocol that uses m bits of initial seed randomness, and certifies cm bits of output randomness using a HRNG that consists of 3 isolated parts, where c is some constant (approximately 3/2). Later in 2010, Pironio et al. gave a protocol that expands m bits of seed randomness to m2 certified bits. In 2011, two simultaneous works — one by Vazirani and Vidick, and one by Fehr, et al. — demonstrated a protocol that attained exponential expansion: starting with m bits, one can certify that a 2-part HRNG produces a whopping 2m certified bits of randomness!

While perhaps 2100 random bits starting from a mere 100 is probably more than you’d ever need, the tantalizing question still stands: is there a limit to how much randomness you can certify, starting with a finite amount of seed randomness? In the upcoming Quantum Information Processing conference, Matt Coudron and I will show that there is no upper limit, and that infinite randomness expansion is possible. We give a protocol where, starting with say 1000 random bits, you can command a finite number of devices (which is eight in our protocol) to produce an unbounded amount of randomness. Furthermore, this protocol is immune to any secret backdoors installed in your HRNG (e.g. by a certain security agency) — the output of your HRNG is guaranteed to be secure against any external eavesdropper.

Classical control of quantum systems

Let’s take a step back from this frenetic expansion of randomness, and appreciate the remarkable nature of these protocols. Note that, this entire time, we know almost nothing about the HRNG being tested. It could have been specially designed by some malicious adversary to try to fool your test as much as possible. It could have at its disposal enormous computational power, like access to the Halting Problem oracle. It could even use quantum entanglement in some yet-unimagined way to try to defeat your test. Yet, by isolating parts of the HRNG from each other, and employing a simple check on their input and outputs, a limited classical being can force the HRNG to produce a long stream of random bits.

In a time when random number generators can be stealthily compromised by altering the dopants in a microchip, or by the intentional weakening of cryptographic standards by government agencies, the ability to rigorously and soundly test randomness has outgrown its philosophical origins in the nature of physical law, and become something of operational significance.

To return to our original question, though: can we tell if a bit is random? We can, but only by employing some some of the deepest ideas drawn from theoretical computer science and quantum physics. More broadly, this paradigm of classically controlling untrusted quantum devices has given rise to what’s called device independent quantum information processing. Not only can we classically manipulate quantum devices to produce genuine randomness, we can do a whole lot more: today we know how to securely exchange secret keys and even perform general quantum computations in a device independent manner! The scope and potential of device independent QIP is enormous, and I think we’ve only gotten started.

Henry Yuen

FOCS 2013 Recaps, Part 2

We continue our FOCS 2013 recaps (Part 1) with recaps by Clément (a guest blogger from Columbia) and Michael.

Clément Canonne on Learning sums of independent integer random variables by Constantinos Daskalakis, Ilias Diakonikolas, Ryan O’Donnell, Rocco Servedio, and Li-Yang Tan.

Background: the problem of inferring the distribution of some data, given only random elements sampled from it, is a fundamental and longstanding question in statistics, science, life and most likely even more. Although a recent line of work in the TCS community has helped understanding this question, the general case — \epsilon-approximating an unknown distribution on n elements, with no assumption on it — is well-known to be (a) hard, and (b) hard (that is, \Theta(n/\epsilon^2) samples needed, and well… n is big). For this reason, a lot of focus has been drawn to the study of particular, “natural” types of distributions that may arise: e.g, mixtures of well-known distributions, or distributions with some promise on their “shape” (monotone, unimodal, log-concave…) the hope being to do better, or to prove one can’t.

For instance, take a biased coin. For quite a few decades, it’s been known that estimating the bias to \pm\epsilon accuracy required \Theta(1/\epsilon^2) samples. What about a sum of n independent coins, each of them with their own bias†? Odd as it may seem, up till last year a non-trivial upper bound was not known [DDS12] — it turns out that for this sort of random variables, O(1/\epsilon^3) samples are enough.

(a quick note here: of course, learning the n different biases would require a dependence on n; the problem here is to approximate the density of the distribution, i.e. come up with any type of distribution which is statistically close to the unknown one)

In this talk, Li-Yang presented an exciting result for an even more general class of distributions, glamorously named kSIIRV. If you remember the example two paragraphs above, there was a sum of n differently-biased biased coins; now, replace the coins by dice — biased dice, each different, each with k faces. To get an outcome, you throw them all, sum everything, and only remember the sum. More formally, a k-sum of integer-valued independent random variables is of the form S = X_1+\dots + X_n, where the X_j‘s are arbitrary independent random variables taking value in \{0,\dots,k-1\} (think of k as a small constant, e.g. 20 or 10^{47}). The question they ask is not can we learn the distribution of S? (yes, of course: \Theta(kn/\epsilon^2)) but rather can we learn this distribution with o(n) samples?. And — surprisingly — the answer is yes. In more details: you can learn k-SIIRVs in time and sample complexity \mathrm{poly}(k,1/\epsilon) independent of n. You can throw as many dice as you want!

A first observation: it’s not obvious. Intuitively, one may think something along the lines of “sure! we can do it with k=2 (the Poisson Binomial Distribution result mentioned below), let’s just generalize to k=k“. Turns out it does not work that well — a PBD is still nice (i.e., unimodal, log-concave, well-behaved): even a 3-SIIRV can be a nightmare (\Omega(1)-far from unimodal, \Omega(1)-far from log-concave, \Omega(1)-far from anything you wish for). So they came up with something else, and proved the following key result: if S is a k-SIIRV, it can essentially be decomposed asS \approx cZ + Y where Z is a discrete Gaussian with the “right” mean, Y is a k-sparse (“nice”) random variable, and c\in[k-1] is an integer. With this characterization in hand, the learning algorithm essentially follows: for all possible choices of c, try to learn the corresponding Z and Y; this yields a pool of k candidates for the final hypothesis, one of them at least being good. Now let them fight, pick the winner.

Of course, the actual analysis and proofs are way more involved, and they use for this machinery beyond the scope of my memory, talent and this small note; but — and that was the great force of this talk — the speaker explained their work clearly, with simple examples and catchy pictures, and managed to give a very good sense of what was going on — while nonchalantly sipping a cup of tea.

(Or coffee, maybe.)

† a.k.a. a Poisson Binomial Distribution: which has nothing to do with Poisson distributions, of course.

Michael Forbes on Polar Codes: Speed of polarization and polynomial gap to capacity, by Venkatesan Guruswami and Patrick Xia.

Consider the binary symmetric channel: you can send bits through this channel, and each will be flipped independently with probability p. We want an optimal scheme for sending information reliably through this channel. In particular, the scheme should take a parameter \epsilon > 0 and devise a way to send N bits through the channel to reliably transmit (1-H(p)-\epsilon)N actual bits of information, so that the failure probability is at most \epsilon.  Further, the scheme should be explicit, and N should be \mathrm{poly}(1/\epsilon).

Arikan (2008) gave a scheme called polar codes meeting these requirements, except that N had to be very large for the scheme to succeed.  Indeed, no concrete bounds were actually known, and the work of Guruswami-Xia shows that N=\mathrm{poly}(1/\epsilon) suffices.

To understand polar codes, first consider the following simpler problem. Given N Bernoulli(p) bits (each being 1 independently with probability p) devise a map from N bits to \approx H(p)N bits that is injective with high probability.  It is not to hard to do so, since there are only \approx 2^{H(p)N} N-bit strings that appear with noticable probability, so we can just list the noticable strings in some enumerate (taking \approx H(p)N bits), and discard the neglible strings.

Now suppose you had such a map that was linear over \mathbb{F}_2. The claim is that this would yield a good scheme for coding in the binary symmetric channel.  Specifically, let the map be defined by the H(p)N \times N matrix A.  Let your “messages” be those strings in the \mathbb{F}_2-nullspace of A.  It is not hard to see that are \approx 2^{(1-H(p))N} such strings.  Suppose now we send such a message m through the channel.  Then the channel will yield m+e, where e is a binary string with each bit independently being 1 with probability p. It follows then that applying A yiels A(m+e)=Ae.  But A is injective with high probability on such strings e, so Ae will determine e, from which point we can determine m since we have m+e.

The above thus reduces the problem of communication over the binary symmetric channel to that of compression.  Arikan’s polar codes then solve this compression problem as follows.  First, an N\times N matrix A is devised so that given the random vector e, the resulting random vector Ae has the polarization property.  That is, if you look at the coordinates of Ae in order, then each coordinate (as a random variable) is either fully determined or fully independent of the previous coordinates (as random variables).  Then, one can see that those coordinates of Ae that are fully determined are not helpful in establishing an injective map, so can be discarded.  This will discard H(p)N rows of the matrix, by conservation of entropy, and thus will solve the compression problem.

To construct such a matrix A, one can start with 2 Bernoulli(p) bits x,y.  If we then transform those to x'=x\oplus y and y'=y, we first see that x' is now Bernoulli(2p(1-p)), and 2p(1-p)\gg p. Thus, the entropy of x' is larger than that of x.  Individually, the entropy of y' is clearly the same as the entropy of y, but when we look at the order of the bits things become interesting. That is, while x and y are independent, x' and y' are not, and in particular reading x' will lower the entropy of y'.

This 2-bit process induces a slight polarization, and now one recurses. That is, given 4 bits, first apply the 2-bit process to two pairs separately.  Then, one has 2 of the bits being “higher entropy” and two bits being “lower entropy”, at which point the process can be repeated under this new grouping.  The challenge in the analysis is: does this eventually lead to near full polarization? If so, how does the speed of polarization depend on N?

Arikan showed that this first question is a resounding yes, but used the martingale convergence theorem which does not address the second question. Guruswami-Xia address this second concern, by studying this process further and giving quantatitive bounds for how the polarization improves in each step.

FOCS 2013 Recaps, Part 1

Editor’s note: don’t forget to add the blog to your favorite RSS reader (link: https://mittheory.wordpress.com/feed/), and follow our twitter account!

The Foundations of Computer Science — or FOCS for short — conference needs no introduction. Along with STOC and SODA, it is one of the juggernaut meetups in the world of theoretical computer science. This year it was held at the Berkeley Marina, which had the benefit that in those spare moments in between talks, we could meander outside the hotel and soak in the rather pleasant view of the Berkeley yacht club, with misted downtown San Francisco far in the distance.

IMG_20131026_174606

Of course, we weren’t there for (just) the pretty views and warm weather: from workshops on “real stable polynomials” (including their application to solving the Kadison-Singer Problem!), to OSNAPs in numerical linear algebra, to nearly maximum flows in nearly linear time, there was plenty of theory to had.

If for some unfortunate reason you weren’t able to attend this year, then I regretfully must inform you that you missed a concert featuring Christos Papadimitriou and the Positive Eigenvalues! But not all is lost, because several students who did attend have selflessly agreed to share their “best of” recaps of FOCS 2013, to tell you about their favorite papers and a brief summary of the talks.

Today, Matt Weinberg will tell us about “A Polynomial Time Algorithm for Lossy Population Recovery”, “The Simple Economics of Approximately Optimal Auctions”, and “Interlacing Families I: Bipartite Ramanujan Graphs of All Degrees”. Next time Clément Canonne and Michael Forbes will share their recaps.

Recaps by Matt Weinberg

A Polynomial Time Algorithm for Lossy Population Recovery
Ankur Moitra and Michael Saks

This paper studies the following problem: imagine that species in a population can be represented as k-bit strings. Then each k-bit string has some probability of occurring in the population. Given access to samples from the population, one would like to recover these probabilities. If this is the model, then the obvious simple algorithm is the best one can do. Instead, the paper assumes that the samples are corrupted in the following way: each bit of the sample is replaced with a “?” independently with probability p. One would still like to use these corrupted samples to learn the population. The main result of this paper is an algorithm that learns the distribution within epsilon and requires only a polynomial number of samples (in k and \epsilon^{-1}) for any constant corruption probability p > 0.

Here are the high level ideas of the proof:

  • There’s a known reduction that says if you can approximately recover the probability of the all-zero string, then you can approximately recover the probability of all strings.
  • Let b be a vector where b_i denotes the probability that an uncorrupted sample from the population has i zeroes. Let also c be a vector where c_i denotes the probability that a corrupted sample from the population has i zeroes. Then there is an easily computable linear transformation A such that Ab = c, where A_{ij} is just the probability that exactly i of j ones remain after corruption. So we can also compute A^{-1}. We can then take corrupted samples to estimate c, and estimate b = A^{-1}c. Unfortunately, the coefficients in A^{-1} are so big that we’d need an exponentially good approximation to c in order to get any meaningful approximation to b by this approach.
  • Instead, they add some error matrix E to A^{-1}. Doing so has two effects: one, it changes the magnitude of terms in A^{-1} and two, it adds additional error to the recovered b_0 term. They write an LP to see if any error matrix exists that both makes the terms of A^{-1} + E small enough so that they don’t need too many samples, and doesn’t add too much additional error to the recovered term.
  • They show that the dual of this LP corresponds to searching for a certain kind of polynomial. Any such polynomial that exists implies an upper bound on how good an error matrix can be.
  • Using tools from complex analysis (specifically, Hadamard Three-Circle Theorem), they show that no polynomial exists that can imply a strong upper bound. Therefore, there are good error matrices and their LP will find one.
  • So in summary, their algorithm takes samples to estimate this vector c, where c_i is the probability that a corrupted sample has exactly i ones. Then, they solve their LP to find a good error matrix E. Then they compute (A^{-1} + E)c to recover a decent approximation to the vector b, where b_i is the probability that an uncorrupted sample had exactly i ones. Then they plug this into the reduction that we mentioned in the beginning.

The Simple Economics of Approximately Optimal Auctions
Saeed Alaei, Hu Fu, Nima Haghpanah, and Jason Hartline

This paper studies the following problem: A salesman has multiple goods to sell to m unit-demand buyers. There may be some feasibility constraints on which agents can simultaneously receive goods. The seller wishes to auction the goods in a way that maximizes his revenue. With only a single good to sell, an interpretation of Myerson’s optimal auction by Bulow and Roberts shows that the optimal auction has a particularly simple form and can be interpreted as maximizing “marginal revenue” (discussed below). The main result of this paper is a characterization of optimal auctions with multiple goods as maximizing marginal revenue as well.

To understand their result, first consider the single item case. When selling a single item to a single buyer, it’s well known that the best thing to do is set a take-it-or-leave-it price p, that the buyer will choose to take with probability q. For any q in [0,1], one can define \mathrm{REV}(q) to be the revenue obtained by choosing a price that the buyer takes with probability exactly q. One can also define the marginal revenue to be \mathrm{REV}'(q). The Bulow-Roberts interpretation of Myerson’s optimal auction says exactly that the optimal auction for m buyers is to ask buyers to report bids b_i, map bids to probabilities q_i (the probability that buyer i would purchase the item at price b_i), and then give the item to the agent maximizing \mathrm{REV}'(q_i). In other words, always give the item to the agent that with the maximum marginal revenue.

In the multi-item case, not much is known even about how to sell many items to a single agent, so it’s not even clear that an analogue of marginal revenue makes sense. The paper proposes defining \mathrm{REV}(q) to be the maximum revenue that could possibly be obtained by selling the items to a single agent in a way such that the agent will receive something with probability exactly q. Specialized to the single-item case, this recovers exactly the original definition. In addition, they give a natural mapping from bids (which are now vectors) to probabilities, and define the marginal revenue mechanism to first map bids to probabilities q_i, then select the feasible set of agents with the maximum value of \mathrm{REV}'(q_i) to be awarded an item. The item they receive is exactly what they would get in the single-agent scheme that defines \mathrm{REV}(q_i). Unlike the single-item setting, the marginal revenue mechanism may not always be optimal. But the main result of the paper shows that the mechanism is in fact optimal if a natural condition (called revenue-linearity) on the input instance holds. Their result is robust to approximation as well: if their condition only holds approximately, then the mechanism is approximately optimal. They also give some specific instances of multi-dimensional settings where revenue-linearity holds exactly and approximately.

Interlacing Families I: Bipartite Ramanujan Graphs of All Degrees (Best Paper Award)
Adam Marcus, Daniel Spielman and Nikhil Srivastava

A d-regular graph is said to be Ramanujan if all of its non-trivial eigenvalues (the ones that aren’t d or -d) are at most 2\sqrt{d-1} in absolute value. This paper addresses the question of whether or not an infinite sequence of d-regular Ramanujan graphs exist for all d > 2, and shows that in fact such a sequence exists for all d > 2 of bipartite graphs.

They obtain their result by proving a conjecture of Bilu and Linial on the existence of certain kinds of graphs. Specifically, the conjecture is that every d-regular graph has a signing (that is, a weighting of all edges to be either +1 or -1), such that all eigenvalues of the signed graph are at most 2 \sqrt{d-1} in absolute value. The paper proves a restricted version of this conjecture for bipartite graphs. Once this conjecture is proved, existing machinery by Bilu and Linial yields the main theorem. Below is a high level overview of the proof:

  • Consider randomly assigning each edge to be +1 or -1 independently with probability 1/2. One could then compute the expected largest eigenvalue of this procedure and hope that it’s at most 2\sqrt{d-1}. Unfortunately, this isn’t the case. Instead, they compute the expected characteristic polynomial. That is, the average over all 2^{|E|} signings of the characteristic polynomial of that signed adjacency matrix.
  • It turns out that this polynomial has already been computed as the matching polynomial by Godsil and Gutman, and it is the matching polynomial of the original graph. That is, the coefficient of x^{n-2i} is the (-1)^i times number of matchings with exactly i edges.
  • Now that they have the expected polynomial, they want to compute its largest root. Fortunately, it turns out that this has also already been done: Heilmann and Lieb showed that all roots are real and that the largest root is at most 2\sqrt{d-1}.
  • So now they have to relate the roots of the expected characteristic polynomial to roots of actual signings. So the question they address is: When do the roots of an average polynomial tell you anything about the roots of those being averaged? The answer is usually “absolutely nothing”, but the main technical result of this paper is a sufficient condition for the answer to be “something useful.” They give a definition called interlacing families of polynomials, and show that for all interlacing families, there exists a polynomial in the group whose largest root is no larger than the largest root of the averaged polynomial.
  • The penultimate step is showing that the characteristic polynomials of each signing forms an interlacing family (which seems quite involved). With this, they’ve now shown that there exists a signing with a characteristic polynomial whose largest eigenvalue is no larger than the largest eigenvalue of the expected polynomial, which is at most 2\sqrt{d-1}. The last step is observing that if G is bipartite, then all eigenvalues come in pairs of the form \{\lambda, -\lambda\}. Therefore, every bipartite graph has a signing with all eigenvalues having absolute value at most 2\sqrt{d-1}.

Welcome!

Welcome to Not So Great Ideas in Theoretical Computer Science (notsogitcs), the blog of the Theory of Computation group at MIT! Here, we hope to give you a glimpse of the exciting world of theoretical computer science through the eyes of those working in its trenches – the students (graduate and undergraduate), the postdocs, and maybe even some of the faculty. What can you expect from us here? Well, in addition to lengthy technical discussions on the latest and greatest in approximation algorithms and complexity lower bounds and algorithmic game theory — those will come, don’t worry — we also plan to share different aspects of Theory at MIT, from the annual “CornFest”, the student theory retreat, to our “Not so Great Ideas in Theoretical Computer Science” tradition! Coming soon:

  • Student theory retreat: Aloni and Themis‘s tale, through words and pictures, of upstate New York, 5 minute research introduction madness, and an unconventional way of learning random matrix theory.
  • FOCS 2013 recaps: highlights of the most recent Foundations of Computer Science conference, held at Berkeley, CA (part 1, part 2).
  • Better algorithms for Approximate Nearest Neighbor: Ilya‘s synopsis of their exciting paper in SODA 2014.
  • A Fountain of Randomness: Henry on an Infinite Randomness Expansion Machine.
  • …and more!

Stay tuned!

Henry