Summary
In this post I will show that any normed space that allows good sketches is necessarily embeddable into an space with
close to
. This provides a partial converse to a result of Piotr Indyk, who showed how to sketch metrics that embed into
for
. A cool bonus of this result is that it gives a new technique for obtaining sketching lower bounds.
This result appeared in a recent paper of mine that is a joint work with Alexandr Andoni and Robert Krauthgamer. I am pleased to report that it has been accepted to STOC 2015.
Sketching
One of the exciting relatively recent paradigms in algorithms is that of sketching. The high-level idea is as follows: if we are interested in working with a massive object , let us start with compressing it to a short sketch
that preserves properties of
we care about. One great example of sketching is the Johnson-Lindenstrauss lemma: if we work with
high-dimensional vectors and are interested in Euclidean distances between them, we can project the vectors on a random
-dimensional subspace, and this will preserve with high probability all the pairwise distances up to a factor of
.
It would be great to understand, for which computational problems sketching is possible, and how efficient it can be made. There are quite a few nice results (both upper and lower bounds) along these lines (see, e.g., graph sketching or a recent book about sketching for numerical linear algebra), but the general understanding has yet to emerge.
Sketching for metrics
One of the main motivations to study sketching is fast computation and indexing of similarity measures between two objects
and
. Often times similarity between objects is modeled by some metric
(but not always! think KL divergence): for instance the above example of the Euclidean distance falls into this category. Thus, instantiating the above general question one can ask: for which metric spaces there exist good sketches? That is, when is it possible to compute a short sketch
of a point
such that, given two sketches
and
, one is able to estimate the distance
?
The following communication game captures the question of sketching metrics. Alice and Bob each have a point from a metric space (say,
and
, respectively). Suppose, in addition, that either
or
(where
and
are the parameters known from the beginning). Both Alice and Bob send messages
and
that are
bits long to Charlie, who is supposed to distinguish two cases (whether
is small or large) with probability at least
. We assume that all three parties are allowed to use shared randomness. Our main goal is to understand the trade-off between
(approximation) and
(sketch size).
Arguably, the most important metric spaces are spaces. Formally, for
we define
to be a
-dimensional space equipped with distance
(when this expression should be understood as
). One can similarly define
spaces for
; even if the triangle inequality does not hold for this case, it is nevertheless a meaningful notion of distance.
It turns out that spaces exhibit very interesting behavior, when it comes to sketching. Indyk showed that for
one can achieve approximation
and sketch size
for every
(for
this was established before by Kushilevitz, Ostrovsky and Rabani). It is quite remarkable that these bounds do not depend on the dimension of a space. On the other hand, for
with
the dependence on the dimension is necessary. It turns out that for constant approximation
the optimal sketch size is
.
Are there any other examples of metrics that admit efficient sketches (say, with constant and
)? One simple observation is that if a metric embeds well into
for
, then one can sketch this metric well. Formally, we say that a map between metric spaces
is an embedding with distortion
, if
for every and for some
. It is immediate to see that if a metric space
embeds into
for
with distortion
, then one can sketch
with
and
. Thus, we know that any metric that embeds well into
with
is efficiently sketchable. Are there any other examples? The amazing answer is that we don’t know!
Our results
Our result shows that for a very important class of metrics—normed spaces—embedding into is the only possible way to obtain good sketches. Formally, if a normed space
allows sketches of size
for approximation
, then for every
the space
embeds into
with distortion
. This result together with the above upper bound by Indyk provides a complete characterization of normed spaces that admit good sketches.
Taking the above result in the contrapositive, we see that non-embeddability implies lower bounds for sketches. This is great, since it potentially allows us to employ many sophisticated non-embeddability results proved by geometers and functional analysts. Specifically, we prove two new lower bounds for sketches: for the planar Earth Mover’s Distance (building on a non-embeddability theorem by Naor and Schechtman) and for the trace norm (non-embeddability was proved by Pisier). In addition to it, we are able to unify certain known results: for instance, classify spaces and the cascaded norms in terms of “sketchability”.
Overview of the proof
Let me outline the main steps of the proof of the implication “good sketches imply good embeddings”. The following definition is central to the proof. Let us call a map between two metric spaces
-threshold, if for every
:
implies
,
implies
.
One should think of threshold maps as very weak embeddings that merely
preserve certain distance scales.
The proof can be divided into two parts. First, we prove that for a normed space that allows sketches of size
and approximation
there exists a
-threshold map to a Hilbert space. Then, we prove that the existence of such a map implies the existence of an embedding into
with distortion
.
The first half goes roughly as follows. Assume that there is no -threshold map from
to a Hilbert space. Then, by convex duality, this implies certain Poincaré-type inequalities on
. This, in turn, implies sketching lower bounds for
(the direct sum of
copies of
, where the norm is definied as the maximum of norms of the components) by a result of Andoni, Jayram and Pătrașcu (which is based on a very important notion of information complexity). Then, crucially using the fact that
is a normed space, we conclude that
itself does not have good sketches (this step follows from the fact that every normed space is of type
and is of cotype
).
The second half uses tools from nonlinear functional analysis. First, building on an argument of Johnson and Randrianarivony, we show that for normed spaces -threshold map into a Hilbert space implies a uniform embedding into a Hilbert space—that is, a map
, where
is a Hilbert space such that
where are non-decreasing functions such that
for every
and
as
. Both
and
are allowed to depend only on
and
. This step uses a certain Lipschitz extension-type theorem and averaging via bounded invariant means. Finally, we conclude the proof by applying theorems of Aharoni-Maurey-Mityagin and Nikishin and obtain a desired (linear) embedding of
into
.
Open problems
Let me finally state several open problems.
The first obvious open problem is to extend our result to as large class of general metric spaces as possible. Two notable examples one should keep in mind are the Khot-Vishnoi space and the Heisenberg group. In both cases, a space admits good sketches (since both spaces are embeddable into -squared), but neither of them is embeddable into
. I do not know, if these spaces are embeddable into
, but I am inclined to suspect so.
The second open problem deals with linear sketches. For a normed space, one can require that a sketch is of the form , where
is a random matrix generated using shared randomness. Our result then can be interpreted as follows: any normed space that allows sketches of size
and approximation
allows a linear sketch with one linear measurement and approximation
(this follows from the fact that for
there are good linear sketches). But can we always construct a linear sketch of size
and approximation
, where
and
are some (ideally, not too quickly growing) functions?
Finally, the third open problem is about spaces that allow essentially no non-trivial sketches. Can one characterize -dimensional normed spaces, where any sketch for approximation
must have size
? The only example I can think of is a space that contains a subspace that is close to
. Is this the only case?
—Ilya