Estimating Transitive Closure via Sampling

In this post, I describe an algorithm of Edith Cohen, which estimates the size of the transitive closure of a given directed graph in near-linear time. This simple, but extremely clever algorithm uses ideas somewhat similar to the algorithm of Flajolet–Martin for estimating the number of distinct elements in a stream, and to MinHash sketch of Broder1.

Suppose we have a large directed graph with n vertices and m directed edges. For a vertex v, let us denote R_v the set of vertices that are reachable from v. There are two known ways to compute sets R_v (all at the same time):

Can we do better? Turns out we can, if we are OK with merely approximating the size of every R_v. Namely, the following theorem was proved back in 1994:

Theorem 1. There exists a randomized algorithm for computing (1 + \varepsilon)-multiplicative approximation for every |R_v| with running time \varepsilon^{-2}\cdot m \cdot \mathrm{poly}(\log n).

Instead of spelling out the full proof, I will present it as a sequence of problems: each of them will likely be doable for a mathematically mature reader. Going through the problems should be fun, and besides, it will save me some typing.

Problem 1. Let f \colon V \to [0, 1] be a function that assigns random independent and uniform reals between 0 and 1 to every vertex. Let us define g(v) = \min_{w \in R_v} f(w). Show how to compute values of g(v) for all vertices v at once in time m \cdot \mathrm{poly}(\log n).

Problem 2. For a positive integer k, denote U_k the distribution of the minimum of k independent and uniform reals between 0 and 1. Suppose we receive several independent samples from U_k with an unknown value of k. Show that we can obtain a (1 + \varepsilon)-multiplicative approximation of k with probability 1 - \delta using as few as O(\log(1 / \delta) / \varepsilon^2) samples.

Problem 3. Combine the solutions for two previous problems and prove Theorem 1.


  1. These similarities explain my extreme enthusiasm towards the algorithm. Sketching-based techniques are useful for a problem covered in 6.006, yay!



3 thoughts on “Estimating Transitive Closure via Sampling

  1. Why do we requre $f$ to be special (in terms of randomness) in Problem 1? Seems like it will work for any reasonable $f$.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s