Estimating Transitive Closure via Sampling

In this post, I describe an algorithm of Edith Cohen, which estimates the size of the transitive closure of a given directed graph in near-linear time. This simple, but extremely clever algorithm uses ideas somewhat similar to the algorithm of Flajolet–Martin for estimating the number of distinct elements in a stream, and to MinHash sketch of Broder1.

Suppose we have a large directed graph with n vertices and m directed edges. For a vertex v, let us denote R_v the set of vertices that are reachable from v. There are two known ways to compute sets R_v (all at the same time):

Can we do better? Turns out we can, if we are OK with merely approximating the size of every R_v. Namely, the following theorem was proved back in 1994:

Theorem 1. There exists a randomized algorithm for computing (1 + \varepsilon)-multiplicative approximation for every |R_v| with running time \varepsilon^{-2}\cdot m \cdot \mathrm{poly}(\log n).

Instead of spelling out the full proof, I will present it as a sequence of problems: each of them will likely be doable for a mathematically mature reader. Going through the problems should be fun, and besides, it will save me some typing.

Problem 1. Let f \colon V \to [0, 1] be a function that assigns random independent and uniform reals between 0 and 1 to every vertex. Let us define g(v) = \min_{w \in R_v} f(w). Show how to compute values of g(v) for all vertices v at once in time m \cdot \mathrm{poly}(\log n).

Problem 2. For a positive integer k, denote U_k the distribution of the minimum of k independent and uniform reals between 0 and 1. Suppose we receive several independent samples from U_k with an unknown value of k. Show that we can obtain a (1 + \varepsilon)-multiplicative approximation of k with probability 1 - \delta using as few as O(\log(1 / \delta) / \varepsilon^2) samples.

Problem 3. Combine the solutions for two previous problems and prove Theorem 1.

Footnotes

  1. These similarities explain my extreme enthusiasm towards the algorithm. Sketching-based techniques are useful for a problem covered in 6.006, yay!

Ilya

Better Circuit Lower Bounds for Explicit Functions

Below is a post by Ilya on a new circuit lower bound.


Let f \colon \{0, 1\}^n \to \{0, 1\} be a Boolean function of n arguments. What is the smallest circuit that can compute f if the set of allowed gates consists of all the unary and binary functions? The size of a circuit is simply the number of gates in it.

It’s easy to show that a random function requires circuits of size \Omega(2^n / n) (this is tight in the worst case), but a big question in computational complexity is to provide a simple enough function that requires large enough circuits to compute.

Basically, if one comes up with a function that lies in \mathrm{NP} and requires circuits of size super-polynomial in n, then \mathrm{P} \ne \mathrm{NP}, and one of the Millenium Problems is solved!

How far are we from accomplishing this? Well, until recently, the best lower bound for a function from NP was 3n [Blum 1984] (remember that eventually we are aiming at super-polynomial lower bounds!).

I’m very excited to report that very recently a better lower bound has been proved by Magnus Gausdal Find, Sasha Golovnev, Edward Hirsch and Sasha Kulikov! Is it super-polynomial in n? Is it at least super-linear in n?

Well, not really. The new lower bound is 3.011n! You may ask: why one should be excited about such a seemingly weak improvement (which, nevertheless, took the authors several years of hard work, as far as I know)?

One reason is that this is the first progress on one of the central questions in computational complexity in more than 30 years. In general, researchers like to invent new models and problems and tend to forget good old “core” questions very easily. I’m happy to see this not being the case here.

Besides, the new result is, in fact, much more general than say the Blum’s. The authors prove that any function that is not constant on every sufficiently high-dimensional affine subspace requires large circuits. Then, they simply invoke the recent result of Ben-Sasson and Kopparty, who construct an explicit function with this property. That is, the paper shows a certain pseudo-randomness property to be enough for better lower bounds. Hopefully, there will be more developments in this direction.

Is this result one more step towards proving \mathrm{P} \ne \mathrm{NP}? Time will tell.

Sketching and Embedding are Equivalent for Norms

Summary

In this post I will show that any normed space that allows good sketches is necessarily embeddable into an \ell_p space with p close to 1. This provides a partial converse to a result of Piotr Indyk, who showed how to sketch metrics that embed into \ell_p for 0 < p \le 2. A cool bonus of this result is that it gives a new technique for obtaining sketching lower bounds.

This result appeared in a recent paper of mine that is a joint work with Alexandr Andoni and Robert Krauthgamer. I am pleased to report that it has been accepted to STOC 2015.

Sketching

One of the exciting relatively recent paradigms in algorithms is that of sketching. The high-level idea is as follows: if we are interested in working with a massive object x, let us start with compressing it to a short sketch \mathrm{sketch}(x) that preserves properties of x we care about. One great example of sketching is the Johnson-Lindenstrauss lemma: if we work with n high-dimensional vectors and are interested in Euclidean distances between them, we can project the vectors on a random O(\varepsilon^{-2} \cdot \log n)-dimensional subspace, and this will preserve with high probability all the pairwise distances up to a factor of 1 + \varepsilon.

It would be great to understand, for which computational problems sketching is possible, and how efficient it can be made. There are quite a few nice results (both upper and lower bounds) along these lines (see, e.g., graph sketching or a recent book about sketching for numerical linear algebra), but the general understanding has yet to emerge.

Sketching for metrics

One of the main motivations to study sketching is fast computation and indexing of similarity measures \mathrm{sim}(x, y) between two objects x and y. Often times similarity between objects is modeled by some metric d(x, y) (but not always! think KL divergence): for instance the above example of the Euclidean distance falls into this category. Thus, instantiating the above general question one can ask: for which metric spaces there exist good sketches? That is, when is it possible to compute a short sketch \mathrm{sketch}(x) of a point x such that, given two sketches \mathrm{sketch}(x) and \mathrm{sketch}(y), one is able to estimate the distance d(x, y)?

The following communication game captures the question of sketching metrics. Alice and Bob each have a point from a metric space X (say, x and y, respectively). Suppose, in addition, that either d_X(x, y) \le r or d_X(x, y) > D \cdot r (where r and D are the parameters known from the beginning). Both Alice and Bob send messages \mathrm{sketch}(x) and \mathrm{sketch}(y) that are s bits long to Charlie, who is supposed to distinguish two cases (whether d_X(x, y) is small or large) with probability at least 0.99. We assume that all three parties are allowed to use shared randomness. Our main goal is to understand the trade-off between D (approximation) and s (sketch size).

Arguably, the most important metric spaces are \ell_p spaces. Formally, for 1 \leq p \leq \infty we define \ell_p^d to be a d-dimensional space equipped with distance

\|x - y\|_p = \Bigl(\sum_{i=1}^d |x_i - y_i|^p\Bigr)^{1/p}

(when p = \infty this expression should be understood as \max_{1 \leq i \leq d} |x_i - y_i|). One can similarly define \ell_p spaces for 0 < p < 1; even if the triangle inequality does not hold for this case, it is nevertheless a meaningful notion of distance.

It turns out that \ell_p spaces exhibit very interesting behavior, when it comes to sketching. Indyk showed that for 0 < p \le 2 one can achieve approximation D = 1 + \varepsilon and sketch size s = O(1 / \varepsilon^2) for every \varepsilon > 0 (for 1 \le p \le 2 this was established before by Kushilevitz, Ostrovsky and Rabani). It is quite remarkable that these bounds do not depend on the dimension of a space. On the other hand, for \ell_p^d with p > 2 the dependence on the dimension is necessary. It turns out that for constant approximation D = O(1) the optimal sketch size is \widetilde{\Theta}(d^{1 - 2/p}).

Are there any other examples of metrics that admit efficient sketches (say, with constant D and s)? One simple observation is that if a metric embeds well into \ell_p for 0 < p \le 2, then one can sketch this metric well. Formally, we say that a map between metric spaces f \colon X \to Y is an embedding with distortion \widetilde{D}, if

d_X(x_1, x_2) \leq C \cdot d_Y\bigl(f(x_1), f(x_2)\bigr) \leq \widetilde{D}  \cdot d_X(x_1, x_2)

for every x_1, x_2 \in X and for some C > 0. It is immediate to see that if a metric space X embeds into \ell_p for 0 < p \le 2 with distortion O(1), then one can sketch X with s = O(1) and D = O(1). Thus, we know that any metric that embeds well into \ell_p with 0 < p \le 2 is efficiently sketchable. Are there any other examples? The amazing answer is that we don’t know!

Our results

Our result shows that for a very important class of metrics—normed spaces—embedding into \ell_p is the only possible way to obtain good sketches. Formally, if a normed space X allows sketches of size s for approximation D, then for every \varepsilon > 0 the space X embeds into \ell_{1 - \varepsilon} with distortion O(sD / \varepsilon). This result together with the above upper bound by Indyk provides a complete characterization of normed spaces that admit good sketches.

Taking the above result in the contrapositive, we see that non-embeddability implies lower bounds for sketches. This is great, since it potentially allows us to employ many sophisticated non-embeddability results proved by geometers and functional analysts. Specifically, we prove two new lower bounds for sketches: for the planar Earth Mover’s Distance (building on a non-embeddability theorem by Naor and Schechtman) and for the trace norm (non-embeddability was proved by Pisier). In addition to it, we are able to unify certain known results: for instance, classify \ell_p spaces and the cascaded norms in terms of “sketchability”.

Overview of the proof

Let me outline the main steps of the proof of the implication “good sketches imply good embeddings”. The following definition is central to the proof. Let us call a map f \colon X \to Y between two metric spaces (s_1, s_2, \tau_1, \tau_2)-threshold, if for every x_1, x_2 \in X:

  • d_X(x_1, x_2) \leq s_1 implies d_Y\bigl(f(x_1), f(x_2)\bigr) \leq \tau_1,
  • d_X(x_1, x_2) \geq s_2 implies d_Y\bigl(f(x_1), f(x_2)\bigr) \geq \tau_2.

One should think of threshold maps as very weak embeddings that merely
preserve certain distance scales.

The proof can be divided into two parts. First, we prove that for a normed space X that allows sketches of size s and approximation D there exists a (1, O(sD), 1, 10)-threshold map to a Hilbert space. Then, we prove that the existence of such a map implies the existence of an embedding into \ell_{1 - \varepsilon} with distortion O(sD / \varepsilon).

The first half goes roughly as follows. Assume that there is no (1, O(sD), 1, 10)-threshold map from X to a Hilbert space. Then, by convex duality, this implies certain Poincaré-type inequalities on X. This, in turn, implies sketching lower bounds for \ell_{\infty}^k(X) (the direct sum of k copies of X, where the norm is definied as the maximum of norms of the components) by a result of Andoni, Jayram and Pătrașcu (which is based on a very important notion of information complexity). Then, crucially using the fact that X is a normed space, we conclude that X itself does not have good sketches (this step follows from the fact that every normed space is of type 1 and is of cotype \infty).

The second half uses tools from nonlinear functional analysis. First, building on an argument of Johnson and Randrianarivony, we show that for normed spaces (1, O(sD), 1, 10)-threshold map into a Hilbert space implies a uniform embedding into a Hilbert space—that is, a map f \colon X \to H, where H is a Hilbert space such that

L\bigl(\|x_1 - x_2\|_X\bigr) \leq      \bigl\|f(x_1) - f(x_1)\bigr\|_H \leq U\bigl(\|x_1 - x_2\|_X\bigr),

where L, U \colon [0; \infty) \to [0; \infty) are non-decreasing functions such that L(t) > 0 for every t > 0 and U(t) \to 0 as t \to 0. Both L and U are allowed to depend only on s and D. This step uses a certain Lipschitz extension-type theorem and averaging via bounded invariant means. Finally, we conclude the proof by applying theorems of Aharoni-Maurey-Mityagin and Nikishin and obtain a desired (linear) embedding of X into \ell_{1 - \varepsilon}.

Open problems

Let me finally state several open problems.

The first obvious open problem is to extend our result to as large class of general metric spaces as possible. Two notable examples one should keep in mind are the Khot-Vishnoi space and the Heisenberg group. In both cases, a space admits good sketches (since both spaces are embeddable into \ell_2-squared), but neither of them is embeddable into \ell_1. I do not know, if these spaces are embeddable into \ell_{1 - \varepsilon}, but I am inclined to suspect so.

The second open problem deals with linear sketches. For a normed space, one can require that a sketch is of the form \mathrm{sketch}(x) = Ax, where A is a random matrix generated using shared randomness. Our result then can be interpreted as follows: any normed space that allows sketches of size s and approximation D allows a linear sketch with one linear measurement and approximation O(sD) (this follows from the fact that for \ell_{1 - \varepsilon} there are good linear sketches). But can we always construct a linear sketch of size f(s) and approximation g(D), where f(\cdot) and g(\cdot) are some (ideally, not too quickly growing) functions?

Finally, the third open problem is about spaces that allow essentially no non-trivial sketches. Can one characterize d-dimensional normed spaces, where any sketch for approximation O(1) must have size \Omega(d)? The only example I can think of is a space that contains a subspace that is close to \ell_{\infty}^{\Omega(d)}. Is this the only case?

Ilya

So Alice and Bob want to flip a coin… – STOC 2014 Recaps (Part 10)

A major use case for coin flipping (with actual coins) is when you’re with friends, and you have to decide where to eat. This agonizing decision process can be elegantly avoided when randomness is used. But who’s doing the coin flipping? How do you know your friend isn’t secretly choosing which restaurant to go to? Cryptography offers a solution to this, and Sunoo will tell us about how this solution is actually equivalent to one way functions!


Sunoo Park on Coin Flipping of Any Constant Bias Implies One-Way Functions, by Itay Berman, Iftach Haitner, and Aris Tentes

In this post, we look at a fundamental problem that has plagued humankind since long before theoretical computer science: if I don’t trust you and you don’t trust me, how can we make a fair random choice? This problem was once solved in the old-fashioned way of flipping a coin, but theoretical computer science has made quite some advances since.

What are the implications of this? In their recent STOC paper, Berman, Haitner, and Tentes show that the ability for two parties to flip a (reasonably) fair coin means that one-way functions exist. This, in turn has far-reaching cryptographic implications.

A function f is one-way if it is “easy” to compute f(x) given any input x, and it is “hard”, given the image y of a random input, to find a preimage x' such that f(x')=y. The existence of one-way functions imply a wide range of fundamental cryptographic primitives, including pseudorandom generation, pseudorandom functions, symmetric-key encryption, bit commitments, and digital signatures – and vice versa: the seminal work of Impagliazzo and Luby [1] showed that the existence of cryptography based on complexity-theoretic hardness assumptions – encompassing the all of the aforementioned primitives – implies that one-way functions exist.

About coin-flipping protocols, however, only somewhat more restricted results were known. Coin-flipping has long been a subject of interest in cryptography, since the early work of Blum [2] which described the following problem:

“Alice and Bob want to flip a coin by telephone. (They have just divorced, live in different cities, want to decide who gets the car.)”

More formally, coin-flipping protocol is a protocol in which two players interact by exchanging messages, which upon completion outputs a single bit interpreted as the outcome of a coin flip. The aim is that the coin should be (close to) unbiased, even if one of the players “cheats” and tries to bias the outcome towards a certain value. We say that a protocol has constant bias if the probability that the outcome is equal to 0 is constant (in a security parameter).

Impagliazzo and Luby’s original paper showed that coin-flipping protocols achieving negligible bias (that is, they are very close to perfect!) imply one-way functions. Subsequently, Maji, Prabhakaran and Sahai [3] proved that coin-flipping protocols with a constant number of rounds (and any non-trivial bias, i.e. 1/2-o(1)) also imply one-way functions. Yet more recently, Haitner and Omri [4] showed that the same holds for any coin-flipping protocol with a constant bias (namely, a bias of \frac{\sqrt{2}-1}{2}-o(1)). Finally, Berman, Haitner and Tentes proved that coin-flipping of any constant bias implies one-way functions. The remainder of this post will give a flavor of the main ideas behind their proof.

The high-level structure of the argument is as follows: given any coin-flipping protocol \Pi=(\mathsf{A},\mathsf{B}) between players \mathsf{A} and \mathsf{B}, we first define a (sort of) one-way function, then show that an adversary capable of efficiently inverting that function must be able to achieve a significant bias in \Pi. The one-way function used is the transcript function which maps the players’ random coinflips to a protocol transcript. The two main neat insights are these:

  • Consider the properties of a coin-flipping protocol when interpreted as a zero-sum game between two players: Alice wins if the outcome is 1, and Bob wins otherwise. If the players play optimally, who wins? It turns out that from the winner, we can deduce that there is a set of protocol transcripts where the outcome is bound to be the winning outcome, no matter what the losing player does: that is, transcripts that are “immune” to the losing player.
  • A recursive variant of the biasing attack proposed by Haitner and Omri in [4] is proposed. The new attack can be employed by the adversary in order to generate a transcript that lies in the “immune” set with probability close to 1 – so, this adversary (who has access to an inverter for the transcript function) can bias the protocol outcome with high probability.

The analysis is rather involved; and there are some fiddly details to resolve, such as how a bounded adversary can simulate the recursive attack efficiently, and ensuring that the inverter will work for the particular query distribution of the adversary. Without going into all those details, the last part of this post describes the optimal recursive attack.

Let \mathsf{Inv} be the function that takes as input a pair (\mathbf{t},b), where \mathbf{t} is a random transcript of a partial (incomplete) honest execution of \Pi and b\in\{0,1\}, and outputs a random pair of random coinflips (r_1,r_2) for the players, satisfying the following two conditions: (1) they are consistent with \mathbf{t}, that is, the coinflips could be plausible next coinflips given the partial transcript \mathbf{t}; and (2) there exists a continuation of the protocol after \mathbf{t} and the next-coinflips (r_1,r_2) that leads to the protocol outcome b.

It seems that an effective way for the adversary to use \mathsf{Inv} might be to call \mathsf{Inv}(\mathbf{t},1) for each partial transcript \mathbf{t} at which the relevant player has to make a move, and then to behave according to the outputted coins. We call this strategy the biased-continuation attack, which is the crucial attack underlying the result of [4].

The new paper proposes a recursive biased-continuation attack that adds an additional layer of sophistication. Let \mathsf{A}^{(0)}=\mathsf{A} be the honest first player’s strategy. Now, define \mathsf{A}^{(i)} to be attacker which, rather than sampling a random 1-continuation among all the possible honest continuations of the protocol (\mathsf{A},\mathsf{B}), instead samples a random 1-continuation among all continuations of (\mathsf{A}^{(i-1)},\mathsf{B}). Note that \mathsf{A}^{(1)} is the biased-continuation attacker described above! It turns out that as the number of recursions grows, the probability that the resulting transcript will land in the “immune” set approaches 1 – meaning a successful attack! Naïvely, this attack may require exponential time due to the many recursions required; however, the paper circumvents this by analyzing the probability that the transcript will land in a set which is “almost immune”, and finding that this probability approaches 1 significantly faster.

[1] One-Way Functions Are Essential for Complexity Based Cryptography. Impagliazzo, Luby (FOCS 1989).
[2] Coin Flipping by Telephone: A Protocol for Solving Impossible Problems. Blum (CRYPTO 1981).
[3] On the Computational Complexity of Coin Flipping. Maji, Prabhakaran, Sahai (FOCS 2010).
[4] Coin Flipping with Constant Bias Implies One-Way Functions. Haitner, Omri (FOCS 2011).

Faster, I say! The race for the fastest SDD linear system solver – STOC 2014 Recaps (Part 9)

In the next post in our series of STOC 2014 recaps, Adrian Vladu tells us about some of the latest and greatest in Laplacian and SDD linear system solvers. There’s been a flurry of exciting results in this line of work, so we hope this gets you up to speed.


The Monday morning session was dominated by a nowadays popular topic, symmetric diagonally dominant (SDD) linear system solvers. Richard Peng started by presenting his work with Dan Spielman, the first parallel solver with near linear work and poly-logarithmic depth! This is exciting, since parallel algorithms are used for large scale problems in scientific computing, so this is a result with great practical applications.

The second talk was given by Jakub Pachocki and Shen Chen Xu from CMU, which was a result of merging two papers. The first result is a new type of trees that can be used as preconditioners. The second one is a more efficient solver, which together with the trees shaved one more \log n factor in the race for the fastest solver.

Before getting into more specific details, it might be a good idea to provide a bit of background on the vast literature of Laplacian solvers.

Typically, linear systems Ax = b are easier to solve whenever A has some structure on it. A particular class we care about are positive semidefinite (PSD) matrices. They work nicely because the solution is the minimizer of the quadratic form f(x) = \frac{1}{2} x^T A x - b x , which happens to be a convex function due to the PSD-ness of A. Hence we can use various versions of gradient descent, the convergence of which depends usually on the condition number of A.

A subset of PSD matrices are Laplacian matrices, which are nothing else but graph Laplacians; using an easy reduction, one can show that any SDD system can be reduced to a Laplacian system. Laplacians are great because they carry a lot of combinatorial structure. Instead of having to suffer through a lot of scary algebra, this is the place where we finally get to solve some fun problems on graphs. The algorithms we aim for have running time close to linear in the sparsity of the matrix.

One reason why graphs are nice is that we know how to approximate them with other simpler graphs. More specifically, when given a graph G, we care about finding a sparser graph H such that L_H \preceq L_G \preceq \alpha \cdot L_Ha, for some small \alpha (the smaller, the better). The point is that whenever you do gradient descent in order to minimize f(x), you can take large steps by solving a system in the sparser L_H. Of course, this requires another linear system solve, only that this only needs to be done on a sparser graph. Applying this idea recursively eventually yields efficient solvers. A lot of combinatorial work is spent on understanding how to compute these sparser graphs.

In their seminal work, Spielman and Teng used ultrasparsifiersb as their underlying combinatorial structure, and after many pages of work they obtained a near linear algorithm with a large polylogarithmic factor in the running time. Eventually, Koutis, Miller and Peng came up with a much cleaner construction, and showed how to construct a chain of sparser and sparser graphs, which yielded a solver that was actually practical. Subsequently, people spent a lot of time trying to shave log factors from the running time, see [9], [6], [7], [12] (the last of which was presented at this STOC), and the list will probably continue.

After this digression, we can get back to the conference program and discuss the results.


An Efficient Parallel Solver for SDD Linear Systems by Richard Peng and Daniel Spielman

How do we solve a system L_G x = b? We need to find away to efficiently apply the operator L_G^+ to b. Even Laplacians are not easy to invert, and what’s worse, their pseudoinverses might not even be sparse. However, we can still represent L_G^+ as a product of sparse matrices which are easy to compute.

We can gain some inspiration from trying to numerically approximate the inverse of (1-x) for some small real x. Taking the Taylor expansion we get that (1-x)^{-1} = \sum_{i \geq 0} x^i = \prod_{k \geq 0} (1+x^{2^k}). Notice that in order to get \epsilon precision, we only need to take the product of the first \Theta(\log 1/\epsilon) factors. It would be great if we could approximate matrix inverses the same way. Actually, we can, since for matrices of norm less than 1 we have the identity (I-A)^{-1} = \prod_{k\geq 0}(I + A^{2^k}). At this point we’d be tempted to think that we’re almost done, since we can just write L_G = D - A, and try to invert I - D^{-1/2} A D^{-1/2}. However we would still need to compute matrix powers, and those matrices might again not even be sparse, so this approach needs more work.

Richard presented a variation of this idea that is more amenable to SDD matrices. He writes

(D-A)^{-1} = \frac{1}{2}( D^{-1} + (I+D^{-1}A)(D - AD^{-1}A)^{-1}(I+AD^{-1} ) )

The only hard part of applying this inverse operator (D-A)^{-1} to a vector consists of left multiplying by (D-AD^{-1}A)^{-1}. How to do this? One crucial ingredient is the fact that M = D-AD^{-1}A is also SDD! Therefore we can recurse, and solve a linear system in M. You might say that we won’t be able to do it efficiently, since M is not sparse. But with a little bit of work and the help of existing spectral sparsification algorithms M can be approximated with a sparse matrix.

Notice that at the k^{th} level of recursion, the operator we need to apply is \left(D-(D^{-1}A)^{2^k}\right)^{-1}. A quick calculation shows that if the condition number of D-A is \kappa, then the condition number of D^{-1}A is 1-1/\kappa. This means that after k = O(\log \kappa) iterations, the eigenvalues of (D^{-1}A)^{2^k} are close to 0, so we can just approximate the operator with D^{-1} without paying too much for the error.

There are a few details left out. Sparsifying M requires a bit of understanding of its underlying structure. Also, in order to do this in parallel, the authors originally employed the spectral sparsification algorithm of Spielman and Teng, combined with a local clustering algorithm of Orecchia, Sachdeva and Vishnoi. Blackboxing these two sophisticated results might question the practicality of the algorithm. Fortunately, Koutis recently produced a simple self-contained spectral sparsification algorithm, which parallelizes and can replace all the heavy machinery in Richard and Dan’s paper.


Solving SDD Linear Systems in Nearly m \log^{1/2} n Time by Michael B. Cohen, Rasmus Kyng, Gary L. Miller, Jakub W. Pachocki, Richard Peng, Anup B. Rao, and Shen Chen Xu

Jakub Pachocki and Shen Chen Xu talked about two results, which together yield the fastest SDD system solver to date. The race is still on!

Let me go through a bit of more background. Earlier on I mentioned that graph preconditioners are used to take long steps while doing gradient descent. A dual of gradient descent on the quadratic function is the Richardson iteration. This is yet another iterative method, which refines a coarse approximation to the solution of a linear system. Let L_G be the Laplacian of our given graph, and L_H be the Laplacian of its preconditioner. Let us assume that we have access to the inverse of L_H. The Richardson iteration computes a sequence x_0, x_1, x_2, \dots, which converges to the solution of the system. It starts with a weak estimate for the solution, and iteratively attempts to decrease the norm of the residue b- L_G x_k by updating the current solution with a coarse approximation to the solution of the system L_G x = b -L_G x_k. That coarse approximation is computed using L_H^{-1}. Therefore steps are given by

x_{k+1} = x_k + \alpha \cdot L_H^{-1} ( b - L_G x_k )

where \alpha \in (0,1] is a parameter that adjusts the length of the step. The better L_H approximates L_G, the fewer steps we need to make.

The problem that Jakub and Shen talked about was finding these good preconditioners. The way they do it is by looking more closely at the Richardson iteration, and weakening the requirements. Instead of having the preconditioner approximate L_G spectrally, they only impose some moment bounds. I will not describe them here, but feel free to read the paper. Proving that these moment bounds can be satisfied using a sparser preconditioner than those that have been so far used in the literature constitutes the technical core of the paper.

Just like in the past literature, these preconditioners are obtained by starting with a good tree, and sampling extra edges. Traditionally, people used low stretch spanning trees. The issue with them is that the number of edges in the preconditioner is determined by the average stretch of the tree, and we can easily check that for the square grid this is \Omega(\log n). Unfortunately, in general we don’t know how to achieve this bound yet; the best known result is off by a \log \log n factor. It turns out that we can still get preconditioners by looking at a different quantity, called the \ell_p stretch (0 < p < 1), which can be brought down to O(\log^p n). This essentially eliminates the need for computing optimal low stretch spanning trees. Furthermore, these trees can be computed really fast, O(m \log \log n) time in the RAM model, and the algorithm parallelizes.

This result consists of a careful combination of existing algorithms on low stretch embeddings of graphs into tree metrics and low stretch spanning trees. I will talk more about these embedding results in a future post.

a. \preceq is also known as the Lowner partial order. A \preceq B is equivalent to 0 \preceq B-A, which says that B-A is PSD.
b. A (k, h)-ultrasparsifier H of G is a graph with n-k+1 edges such that L_H \preceq L_G \preceq h \cdot L_H. It turns out that one is able to efficiently construct (k, n/k\ \mathrm{polylog}(n)) ultrasparsifiers. So by adding a few edges to a spanning tree, you can drastically reduce the relative condition number with the initial graph.

[1] Approaching optimality for solving SDD systems, Koutis, Miller, Peng (2010).
[2] Nearly-Linear Time Algorithms for Graph Partitioning, Graph Sparsification, and Solving Linear Systems, Spielman, Teng (2004).
[3] A Local Clustering Algorithm for Massive Graphs and its Application to Nearly-Linear Time Graph Partitioning, Spielman, Teng (2013).
[4] Spectral Sparsification of Graphs, Spielman, Teng (2011).
[5] Nearly-Linear Time Algorithms for Preconditioning and Solving Symmetric, Diagonally Dominant Linear Systems, Spielman, Teng (2014).
[6] A Simple, Combinatorial Algorithm for Solving SDD Systems in Nearly-Linear Time, Kelner, Orecchia, Sidford, Zhu (2013).
[7] Efficient Accelerated Coordinate Descent Methods and Faster Algorithms for Solving Linear Systems, Lee, Sidford (2013).
[8] A nearly-mlogn time solver for SDD linear systems, Koutis, Miller, Peng (2011).
[9] Improved Spectral Sparsification and Numerical Algorithms for SDD Matrices, Koutis, Levin, Peng (2012).
[10] Simple parallel and distributed algorithms for spectral graph sparsification, Koutis (2014).
[11] Preconditioning in Expectation, Cohen, Kyng, Pachocki, Peng, Rao (2014).
[12] Stretching Stretch, Cohen, Miller, Pachocki, Peng, Xu (2014).
[13] Solving SDD Linear Systems in Nearly mlog1/2n Time, Cohen, Kyng, Miller, Pachocki, Peng, Rao, Xu (2014).
[14] Approximating the Exponential, the Lanczos Method and an \tilde{O}(m)-Time Spectral Algorithm for Balanced Separator, Orecchia, Sachdeva, Vishoni (2012).

Spaghetti code crypto – STOC 2014 Recaps (Part 6)

We return to our STOC recap programming with Justin Holmgren, who will tell us about an exciting new result in program obfuscation. (However, he doesn’t tell us how to say “indistinguishability obfuscation” 10 times fast).


Justin Holmgren on How to Use Indistinguishability Obfuscation: Deniable Encryption, and More, by Amit Sahai and Brent Waters

One very recent topic of study in cryptography is program obfuscation – that is, how to make a program maximally unintelligible, while still preserving the functionality. We will represent programs as circuits, which for most purposes is equivalent to any other representation. At some level, obfuscation of a program seems like it should be an easy task – any programmer knows that it’s easy to end up with unreadable spaghetti code, but theoretical treatment of the issue was lacking, and finding a construction with provable security properties has proven quite difficult.

In 2001, Barak et al. [1] precisely formulated black-box obfuscation, which says that if you can compute something given the obfuscated program, you could also compute it given black-box access to the program. This is a natural and very strong notion of obfuscation. If we could obfuscate programs according to this definition, we could easily solve many difficult problems in cryptography. For example, a public-key fully homomorphic encryption scheme could be constructed from an ordinary symmetric-key encryption scheme by obfuscating a program which decrypts two input ciphertexts, and outputs the encryption of their NAND. Unfortunately, [1] also showed that black-box obfuscation is impossible to achieve.

One weaker definition of obfuscation that [1] did not rule out is called indistinguishability obfuscation (iO). It requires that if two circuits C_1 and C_2 are functionally equivalent (for all x, C_1(x) = C_2(x)), then their obfuscations are indistinguishable. In 2013, Garg et al. [2] gave the first candidate construction for iO. This construction is not proven secure under standard assumptions (e.g. the hardness of factoring), but no attacks on the construction are currently known, and it has been proven that certain types of generic attacks cannot possibly work.

It isn’t immediately clear whether iO is useful. After all, what information is the obfuscation really hiding if it only guarantees indistinguishability from functionally equivalent programs? “How to use Indistinguishability Obfuscation: Deniable Encryption and More”, by Amit Sahai and Brent Waters [3], shows several applications of iO to building cryptographic primitives, and proposes iO as a “hub of cryptography”, meaning that constructions which use iO as a building block would be able to benefit from improvements in the efficiency or underlying computational assumptions of iO constructions.

As a first example of how to use iO, they show how to construct an IND-CPA public key cryptosystem based on one-way functions and iO. This uses a few other basic and well-studied cryptographic primitives (it is known that all of them exist if one-way functions exist). First, a pseudorandom generator is a deterministic function which takes as input a random string s, and computes a longer string r which is indistinguishable from truly random. The length of r can be polynomially larger than the length of s, but we will let r be twice the length of s. A pseudorandom function family is parameterized by a secret key k and an input x, and produces outputs of some fixed length. Oracle access to f_k(\cdot) for some random unknown k is computationally indistinguishable from oracle access to a truly random function. The main special property that [2] require is a puncturable pseudorandom function family. That is, for a secret key k and an input x, a “punctured” key k\{x\} can be computed. k\{x\} allows one to compute f_k(y) for all y \neq x, but essentially reveals no information about f_k(x).

The power of iO comes from combining its indistinguishability guarantees with those of other cryptographic primitives, in what is known as the hybrid method. We will start with the security game played with a real scheme, and give a sequence of other games, each indistinguishable from the previous one to the adversary, such that the adversary obviously has no advantage in the last game. This will prove that the scheme is secure.

For example, a simple public key encryption scheme for one-bit messages (let G be a PRG, and F be a puncturable pseudorandom function family) would be the following.

  • The secret key SK is just a puncturable PRF key k.
  • The public key PK is the obfuscation of an Encrypt program. Encrypt takes a message m and randomness r, and outputs (G(r), F_k(G(r)) \oplus m).
  • To decrypt a ciphertext (c_1, c_2) given SK, one just computes F_k(c_1)  \oplus c_2.

In the IND-CPA security game, there is a challenger and an adversary. The challenger generates the public and secret key, and gives the public key to the adversary. The challenger also generates an encryption of a random bit b, and gives this to the adversary. The adversary wins if he can guess the bit that the challenger chose.

In the first hybrid, the challenger encrypts by choosing a truly random \tilde{r}, and generating the ciphertext (\tilde{r}, F_k(\tilde{r}) \oplus b). This is indistinguishable by the security property of a pseudorandom generator.

In the next hybrid, the program Encrypt which is obfuscated and given to the adversary uses a punctured key k\{\tilde{r}\} instead of the full PRF key k. This is with high probability functionally equivalent, because \tilde{r} is most likely not in the range of G. So by the security of iO, this hybrid is indistinguishable from the previous one.

Next, the ciphertext is generated by choosing a truly random bit p, and generating the ciphertext (\tilde{r}, p \oplus b). This is also indistinguishable to the adversary, because the punctured key reveals nothing about F_k(\tilde{r}). That is, if these two hybrids were distinguishable, then given only k\{\tilde{r}\}, one could distinguish between a truly random bit p and the true value F_k(\tilde{r}).

In this hybrid, it is clear (by analogy to a one-time pad) that no adversary can distinguish a ciphertext encrypting 0 from a ciphertext encrypting 1. So the original scheme is also secure.

Sahai and Waters achieve more than just public-key encryption – they also achieved the long-standing open problem of publicly deniable encryption. This is an encryption system for which if Encrypt(m ; r) = c, one can also compute a “fake” randomness r' for any other message m' such that Encrypt(m'; r') is also equal to c, and r' is indistinguishable from r. The motivation is that after a message is encrypted, coercing the encrypter into revealing his message is futile – there is no way of verifying whether what he says is the truth. Note that of course the decrypter could be coerced to reveal his secret key, but there’s nothing that can be done about that.

The basic idea of the deniable encryption scheme is that the encryption public key is again an obfuscated program – this program takes a message and randomness string. If the randomness string has some rare special structure (it is a hidden sparse trigger), it is interpreted as encoding a particular ciphertext, and the program will output that ciphertext. Otherwise, the program encrypts normally. There is also a public obfuscated Explain program, which allows anybody to sample a random hidden sparse trigger, but not test whether or not a string is a hidden sparse trigger.

At present, iO obfuscation is known to imply some very powerful primitives, including for example multi-input functional encryption. It is still open to construct fully homomorphic encryption for iO, or even collision resistant hash functions from iO, or to show that no such reduction is possible.

[1] On the (Im)possibility of Obfuscating Programs, Barak, Goldreich, Impagliazzo, Rudich, Sahai, Vadhan, Yang, 2001.
[2] Candidate Indistinguishability Obfuscation, Garg, Gentry, Halevi, Raykova, Sahai, Waters, 2013.
[3] How to Use Indistinguishability Obfuscation: Deniable Encryption, and More, Sahai, Waters, 2014.