Dmitry Yarotsky's home page


Hi, I'm an applied mathematician at Skoltech and, part-time, at Institute for Information Transmission Problems. Previously, I worked at the software company Datadvance as an expert in data analysis and optimization.

My research

I'm interested in various kinds of applied mathematics. Below I describe some of the topics I have worked on.

Deep neural networks

Discontinuous weight selection and emergence of subprograms

There is a general DeVore-Howard-Micchelli theorem establishing upper bounds on approximation rates of any parameterized approximation model (for example, a neural network) under assumption of continuous parameter selection. However, if the assumption of continuity is dropped, deep neural nets with a rather standard (but deep!) architecture and the standard ReLU activation function can surpass these bounds. This can be achieved by speeding up the computation with something like subprograms, see this paper. This construction is inspired by a construction used by Shannon in his work on Boolean circuits.

Fast approximation of smooth functions with deep ReLU networks

Smooth functions can be typically approximated more efficiently than non-smooth functions. Usually, this is achieved by choosing an approximating model of appropriate smoothness (e.g., using cubic splines rather than linear splines if the approximated function is smooth). However, deep neural networks can provide efficient (in a sense, optimal) approximation rates for smooth functions even if their activation function is only piecewise-linear (standard ReLU), see the paper (preprint).

Voxel features for 3D shape recognition

I have written a small library for computation of various geometric features of voxelized 3D shapes. These features can be used in automated classification of 3D shapes, e.g. by training an XGBoost classifier; see the paper.
feature examples predictions

Space tether systems

Space tether systems is an interesting class of systems, potentially useful for various purposes such as space debris removal, satellite collocation, etc. In this joint work with our Astrium colleagues we studied a "hub-and-spoke" pyramidal formation rotating about a central satellite and holding another satellite beneath it. Unfortunately, this configuration requires a relatively high fuel consumption.

So, in this paper we proposed another, freely moving (no fuel!) formation serving the same purpose. Instead of a circle, deputy satellites now move along Lissajous curves. We find relations between the system's parameters ensuring that the satellites and tethers never collide and the main satellite remains immobile, and show how all these relations can be satisfied.

Interestingly, the model seems to be especially stable if there are at least 5 deputy satellites. Also interestingly, the tethers can get entangled during operation; we have been able to only partially demarcate the cases of absent or present entanglement (based on the winding number invariant).

Surrogate Based Optimization (SBO)

In this post I tried to explain in simple terms the idea of SBO and its most natural version based on Expected Improvement (EI).

My research in this area concerned the following question: can EI-based SBO fail, in the sense of never getting near the true global optimum? The expected answer is "yes", but the proof is not obvious because the behavior of SBO trajectories is not well understood on a rigorous level. Nevertheless, in this paper I give a rigorous example of failure in a sort of "analytic black hole" scenario.


Explicit error formulas seem to be rare in the approximation theory. One well-known example is the beautiful integral error formula for the common polynomial interpolation with \(N\) knots \(x_1,\ldots,x_N\): \[f(x)-\widehat{f}(x)=\frac{\prod_{n=1}^N (x-x_n)}{N!} \int_{s_n\ge 0, \sum_{n=0}^N s_n=1} \frac{d^N f}{dx^N}\Big(\sum_{n=0}^N s_n x_n \Big) d\mathbf{s},\] where \(x\equiv x_0\). This formula immediately implies, for example, that the interpolants converge to the true function if it is analytic in a sufficiently large domain. In this paper I show that this formula can be generalized to interpolation by exponential or Gaussian functions using the Harish-Chandra-Itzykson-Zuber integral; in particular, for Gaussian basis functions \(e^{-(x-x_n)^2/2}\) \[f(x)- \widehat{f}(x) = \frac{ \prod_{n=1}^N (x-x_n) }{ N! Z} \int_{S^{2N+1}} \int_{\mathbb{U}(N)} e^{\mathrm{tr}(X U^{\dagger} P_\mathbf{v}^{\dagger} \widetilde{X} P_\mathbf{v} U)} e^{-\frac{x^2}{2}}\Big[\prod_{n=1}^N \big(\frac{d}{dq}-x_n\big)\Big] e^{\frac{q^2}{2}}f(q)\Big|_{q = \mathbf{v}^{\dagger} \widetilde{X}\mathbf{v}} d\mathbf{v}dU\] Though this expression looks complicated, it can be used to prove convergence of interpolants almost as easily as in the polynomial case. This result does not seem to generalize to more general radial basis functions; e.g. the proof breaks down even for basis functions of the form \(\sum_{k} c_k e^{-(x-x_n)^2/a_k}\). The HCIZ integral is well known in the random matrix theory, representation theory and quantum field theory; it is interesting that it also has applications to the interpolation theory.

Quantum Spin Systems

My research in this area mostly concerned rigorous analysis of ground states with the help of cluster expansions.

In this paper I developed a quadratic form-based perturbation theory and used it to prove that small perturbations of the AKLT model remain gapped (which was widely believed, but hard to prove).

In this paper (preprint) I prove uniqueness of the ground state of a weakly interacting system in a strong sense involving "most general quantum boundary conditions", and discuss how one can interprete these conditions.

In this paper (preprint) I show that the so-called "commensurate-incommensurate transition" in the AKLT model can be explained by a peculiar Poisson-type random walk with a single reversal.

My industrial experience

At Datadvance, I was one of the developers of the Macros/pSeven Core library and other custom software for optimization and predictive modeling. This software is used at a number of major engineering companies, e.g. at Airbus. Our toolbox of regression methods is described in this paper.

My hobbies

Occasionally, I like to take part in Kaggle's challenges in data mining.