# Sketching and Embedding are Equivalent for Norms

## Summary

In this post I will show that any normed space that allows good sketches is necessarily embeddable into an $\ell_p$ space with $p$ close to $1$. This provides a partial converse to a result of Piotr Indyk, who showed how to sketch metrics that embed into $\ell_p$ for $0 < p \le 2$. A cool bonus of this result is that it gives a new technique for obtaining sketching lower bounds.

This result appeared in a recent paper of mine that is a joint work with Alexandr Andoni and Robert Krauthgamer. I am pleased to report that it has been accepted to STOC 2015.

## Sketching

One of the exciting relatively recent paradigms in algorithms is that of sketching. The high-level idea is as follows: if we are interested in working with a massive object $x$, let us start with compressing it to a short sketch $\mathrm{sketch}(x)$ that preserves properties of $x$ we care about. One great example of sketching is the Johnson-Lindenstrauss lemma: if we work with $n$ high-dimensional vectors and are interested in Euclidean distances between them, we can project the vectors on a random $O(\varepsilon^{-2} \cdot \log n)$-dimensional subspace, and this will preserve with high probability all the pairwise distances up to a factor of $1 + \varepsilon$.

It would be great to understand, for which computational problems sketching is possible, and how efficient it can be made. There are quite a few nice results (both upper and lower bounds) along these lines (see, e.g., graph sketching or a recent book about sketching for numerical linear algebra), but the general understanding has yet to emerge.

## Sketching for metrics

One of the main motivations to study sketching is fast computation and indexing of similarity measures $\mathrm{sim}(x, y)$ between two objects $x$ and $y$. Often times similarity between objects is modeled by some metric $d(x, y)$ (but not always! think KL divergence): for instance the above example of the Euclidean distance falls into this category. Thus, instantiating the above general question one can ask: for which metric spaces there exist good sketches? That is, when is it possible to compute a short sketch $\mathrm{sketch}(x)$ of a point $x$ such that, given two sketches $\mathrm{sketch}(x)$ and $\mathrm{sketch}(y)$, one is able to estimate the distance $d(x, y)$?

The following communication game captures the question of sketching metrics. Alice and Bob each have a point from a metric space $X$ (say, $x$ and $y$, respectively). Suppose, in addition, that either $d_X(x, y) \le r$ or $d_X(x, y) > D \cdot r$ (where $r$ and $D$ are the parameters known from the beginning). Both Alice and Bob send messages $\mathrm{sketch}(x)$ and $\mathrm{sketch}(y)$ that are $s$ bits long to Charlie, who is supposed to distinguish two cases (whether $d_X(x, y)$ is small or large) with probability at least $0.99$. We assume that all three parties are allowed to use shared randomness. Our main goal is to understand the trade-off between $D$ (approximation) and $s$ (sketch size).

Arguably, the most important metric spaces are $\ell_p$ spaces. Formally, for $1 \leq p \leq \infty$ we define $\ell_p^d$ to be a $d$-dimensional space equipped with distance

$\|x - y\|_p = \Bigl(\sum_{i=1}^d |x_i - y_i|^p\Bigr)^{1/p}$

(when $p = \infty$ this expression should be understood as $\max_{1 \leq i \leq d} |x_i - y_i|$). One can similarly define $\ell_p$ spaces for $0 < p < 1$; even if the triangle inequality does not hold for this case, it is nevertheless a meaningful notion of distance.

It turns out that $\ell_p$ spaces exhibit very interesting behavior, when it comes to sketching. Indyk showed that for $0 < p \le 2$ one can achieve approximation $D = 1 + \varepsilon$ and sketch size $s = O(1 / \varepsilon^2)$ for every $\varepsilon > 0$ (for $1 \le p \le 2$ this was established before by Kushilevitz, Ostrovsky and Rabani). It is quite remarkable that these bounds do not depend on the dimension of a space. On the other hand, for $\ell_p^d$ with $p > 2$ the dependence on the dimension is necessary. It turns out that for constant approximation $D = O(1)$ the optimal sketch size is $\widetilde{\Theta}(d^{1 - 2/p})$.

Are there any other examples of metrics that admit efficient sketches (say, with constant $D$ and $s$)? One simple observation is that if a metric embeds well into $\ell_p$ for $0 < p \le 2$, then one can sketch this metric well. Formally, we say that a map between metric spaces $f \colon X \to Y$ is an embedding with distortion $\widetilde{D}$, if

$d_X(x_1, x_2) \leq C \cdot d_Y\bigl(f(x_1), f(x_2)\bigr) \leq \widetilde{D} \cdot d_X(x_1, x_2)$

for every $x_1, x_2 \in X$ and for some $C > 0$. It is immediate to see that if a metric space $X$ embeds into $\ell_p$ for $0 < p \le 2$ with distortion $O(1)$, then one can sketch $X$ with $s = O(1)$ and $D = O(1)$. Thus, we know that any metric that embeds well into $\ell_p$ with $0 < p \le 2$ is efficiently sketchable. Are there any other examples? The amazing answer is that we don’t know!

## Our results

Our result shows that for a very important class of metrics—normed spaces—embedding into $\ell_p$ is the only possible way to obtain good sketches. Formally, if a normed space $X$ allows sketches of size $s$ for approximation $D$, then for every $\varepsilon > 0$ the space $X$ embeds into $\ell_{1 - \varepsilon}$ with distortion $O(sD / \varepsilon)$. This result together with the above upper bound by Indyk provides a complete characterization of normed spaces that admit good sketches.

Taking the above result in the contrapositive, we see that non-embeddability implies lower bounds for sketches. This is great, since it potentially allows us to employ many sophisticated non-embeddability results proved by geometers and functional analysts. Specifically, we prove two new lower bounds for sketches: for the planar Earth Mover’s Distance (building on a non-embeddability theorem by Naor and Schechtman) and for the trace norm (non-embeddability was proved by Pisier). In addition to it, we are able to unify certain known results: for instance, classify $\ell_p$ spaces and the cascaded norms in terms of “sketchability”.

## Overview of the proof

Let me outline the main steps of the proof of the implication “good sketches imply good embeddings”. The following definition is central to the proof. Let us call a map $f \colon X \to Y$ between two metric spaces $(s_1, s_2, \tau_1, \tau_2)$-threshold, if for every $x_1, x_2 \in X$:

• $d_X(x_1, x_2) \leq s_1$ implies $d_Y\bigl(f(x_1), f(x_2)\bigr) \leq \tau_1$,
• $d_X(x_1, x_2) \geq s_2$ implies $d_Y\bigl(f(x_1), f(x_2)\bigr) \geq \tau_2$.

One should think of threshold maps as very weak embeddings that merely
preserve certain distance scales.

The proof can be divided into two parts. First, we prove that for a normed space $X$ that allows sketches of size $s$ and approximation $D$ there exists a $(1, O(sD), 1, 10)$-threshold map to a Hilbert space. Then, we prove that the existence of such a map implies the existence of an embedding into $\ell_{1 - \varepsilon}$ with distortion $O(sD / \varepsilon)$.

The first half goes roughly as follows. Assume that there is no $(1, O(sD), 1, 10)$-threshold map from $X$ to a Hilbert space. Then, by convex duality, this implies certain Poincaré-type inequalities on $X$. This, in turn, implies sketching lower bounds for $\ell_{\infty}^k(X)$ (the direct sum of $k$ copies of $X$, where the norm is definied as the maximum of norms of the components) by a result of Andoni, Jayram and Pătrașcu (which is based on a very important notion of information complexity). Then, crucially using the fact that $X$ is a normed space, we conclude that $X$ itself does not have good sketches (this step follows from the fact that every normed space is of type $1$ and is of cotype $\infty$).

The second half uses tools from nonlinear functional analysis. First, building on an argument of Johnson and Randrianarivony, we show that for normed spaces $(1, O(sD), 1, 10)$-threshold map into a Hilbert space implies a uniform embedding into a Hilbert space—that is, a map $f \colon X \to H$, where $H$ is a Hilbert space such that

$L\bigl(\|x_1 - x_2\|_X\bigr) \leq \bigl\|f(x_1) - f(x_1)\bigr\|_H \leq U\bigl(\|x_1 - x_2\|_X\bigr),$

where $L, U \colon [0; \infty) \to [0; \infty)$ are non-decreasing functions such that $L(t) > 0$ for every $t > 0$ and $U(t) \to 0$ as $t \to 0$. Both $L$ and $U$ are allowed to depend only on $s$ and $D$. This step uses a certain Lipschitz extension-type theorem and averaging via bounded invariant means. Finally, we conclude the proof by applying theorems of Aharoni-Maurey-Mityagin and Nikishin and obtain a desired (linear) embedding of $X$ into $\ell_{1 - \varepsilon}$.

## Open problems

Let me finally state several open problems.

The first obvious open problem is to extend our result to as large class of general metric spaces as possible. Two notable examples one should keep in mind are the Khot-Vishnoi space and the Heisenberg group. In both cases, a space admits good sketches (since both spaces are embeddable into $\ell_2$-squared), but neither of them is embeddable into $\ell_1$. I do not know, if these spaces are embeddable into $\ell_{1 - \varepsilon}$, but I am inclined to suspect so.

The second open problem deals with linear sketches. For a normed space, one can require that a sketch is of the form $\mathrm{sketch}(x) = Ax$, where $A$ is a random matrix generated using shared randomness. Our result then can be interpreted as follows: any normed space that allows sketches of size $s$ and approximation $D$ allows a linear sketch with one linear measurement and approximation $O(sD)$ (this follows from the fact that for $\ell_{1 - \varepsilon}$ there are good linear sketches). But can we always construct a linear sketch of size $f(s)$ and approximation $g(D)$, where $f(\cdot)$ and $g(\cdot)$ are some (ideally, not too quickly growing) functions?

Finally, the third open problem is about spaces that allow essentially no non-trivial sketches. Can one characterize $d$-dimensional normed spaces, where any sketch for approximation $O(1)$ must have size $\Omega(d)$? The only example I can think of is a space that contains a subspace that is close to $\ell_{\infty}^{\Omega(d)}$. Is this the only case?

Ilya