PhD defense Junjie Yang: Enhancing surrogate regression methods for structured prediction: an odyssey with loss functions

Monday 7 April, 2025, at 10.00 (Paris time) at Télécom Paris

Télécom Paris, 19 place Marguerite Perey F-91120 Palaiseau [getting there], amphi 2 and in videoconferencing

Jury

Florence d’Alché-Buc, Professor, Télécom Paris, France (Supervisor)
Thomas Bonald, Professor, Télécom Paris, France (Examiner)
Claire Boyer, Professor, Université Paris-Saclay, France (Examiner)
Carlo Ciliberto, Associate Professor, University College London, United Kingdom (Reviewer)
Caio Corro, Associate Professor, INSA Rennes, France (Examiner)
Nicolas Courty, Professor, Université Bretagne Sud / IRISA, France (Reviewer)
Matthieu Labeau, Associate Professor, Télécom Paris, France (Supervisor)
Titouan Vayer, Researcher, INRIA Lyon, France (Examiner)

Abstract

Machine learning, a rapidly evolving field at the intersection of mathematics and computer science, has transformed both scientific research and real-world applications. Beyond classification and regression, it now allows tackling structured prediction, enabling breakthroughs in machine translation, metabolite identification, and protein structure prediction, to name a few.

Structured prediction (SP) is challenging...

due to its large, combinatorial output space. Surrogate regression methods like implicit loss embedding (ILE) and output kernel regression (OKR) address this by mapping structured outputs into a Hilbert space, converting SP into a vector-valued learning problem. However, they still face several challenges: (i) their performance depends heavily on complex loss function design, (ii) the implicit or infinite-dimensional nature of surrogate spaces limits neural network integration, and (iii) inference remains computationally demanding. This thesis aims to improve surrogate regression methods to overcome these limitations. For this purpose, we leverage several families of mathematical tools, including optimal transport (OT), kernel methods, and contrastive learning. We first address structured prediction for labeled graphs, leveraging recent advances in optimal transport distances. We introduce the fused network Gromov-Wasserstein (FNGW) distance, which incorporates edge features into computations. Using FNGW as a loss function in the ILE framework, we develop ILE-FNGW, generating predictions as FNGW barycenters. To tackle inference complexity, we propose Any2Graph-FNGW, a neural network-based model that predicts directly in a relaxed surrogate graph space, simplifying inference through efficient decoding. Next, building on OKR, we introduce deep sketched output kernel regression (DSOKR), a new framework that extends neural networks as surrogate hypothesis spaces for general structured outputs. DSOKR constructs a finite-dimensional subspace of a reproducing kernel Hilbert space (RKHS) using random sketching. This approach preserves flexibility by allowing any neural architecture for input processing while requiring only the prediction of coefficients for a finite-dimensional basis in the output layer. Finally, we introduce a novel SP framework, explicit loss embedding (ELE), which replaces predefined loss functions for structured data with a learnable, differentiable loss. This loss is defined as the squared Euclidean distance between neural network-parameterized embeddings and is learned directly from output data using contrastive learning. The new loss serves a dual purpose: during training, it formulates a finite-dimensional surrogate regression problem, and during inference, it defines a differentiable decoding objective. We evaluate all proposed methods on supervised graph prediction tasks, highlighting the distinct characteristics of each SP approach.