Det är som att sitta och följa utvecklingen av elektrodynamiken eller kärnfysikens lagar när den sker i realtid. Istället för att bara kasta in mer beräkningskraft så finns det dom som tittar noga på algoritmerna som leder fram till maskinell intelligens. Snart kommer systemen själva kunna göra den här sortens arbete, prova olika implementeringar, behålla och instansieria dom som fungerar och förpassa dom förändringar som inte är optimala.
https://github.com/yifanzhang-pro/deep-delta-learning/blob/master/Deep_Delta_Learning.pdf
Citat:
Deep Delta Learning Yifan Zhang1 Yifeng Liu2 Mengdi Wang1 Quanquan Gu2
1Princeton University 2University of California, Los Angeles
yifzhang@princeton.edu
January 1, 2026
Abstract
The efficacy of deep residual networks is fundamentally predicated on the identity shortcut
connection. While this mechanism effectively mitigates the vanishing gradient problem, it
imposes a strictly additive inductive bias on feature transformations, thereby limiting the
network’s capacity to model complex state transitions. In this paper, we introduce Deep Delta
Learning (DDL), a novel architecture that generalizes the standard residual connection by
modulating the identity shortcut with a learnable, data-dependent geometric transformation.
This transformation, termed the Delta Operator, constitutes a rank-1 perturbation of the
identity matrix, parameterized by a reflection direction vector k(X) and a gating scalar β(X).
We provide a spectral analysis of this operator, demonstrating that the gate β(X) enables
dynamic interpolation between identity mapping, orthogonal projection, and geometric reflection.
Furthermore, we restructure the residual update as a synchronous rank-1 injection, where the
gate acts as a dynamic step size governing both the erasure of old information and the writing of
new features. This unification empowers the network to explicitly control the spectrum of its
layer-wise transition operator, enabling the modeling of complex, non-monotonic dynamics while
preserving the stable training characteristics of gated residual architectures
1Princeton University 2University of California, Los Angeles
yifzhang@princeton.edu
January 1, 2026
Abstract
The efficacy of deep residual networks is fundamentally predicated on the identity shortcut
connection. While this mechanism effectively mitigates the vanishing gradient problem, it
imposes a strictly additive inductive bias on feature transformations, thereby limiting the
network’s capacity to model complex state transitions. In this paper, we introduce Deep Delta
Learning (DDL), a novel architecture that generalizes the standard residual connection by
modulating the identity shortcut with a learnable, data-dependent geometric transformation.
This transformation, termed the Delta Operator, constitutes a rank-1 perturbation of the
identity matrix, parameterized by a reflection direction vector k(X) and a gating scalar β(X).
We provide a spectral analysis of this operator, demonstrating that the gate β(X) enables
dynamic interpolation between identity mapping, orthogonal projection, and geometric reflection.
Furthermore, we restructure the residual update as a synchronous rank-1 injection, where the
gate acts as a dynamic step size governing both the erasure of old information and the writing of
new features. This unification empowers the network to explicitly control the spectrum of its
layer-wise transition operator, enabling the modeling of complex, non-monotonic dynamics while
preserving the stable training characteristics of gated residual architectures
Citat:
Conclusion
We have introduced Deep Delta Learning, a novel architecture built upon an adaptive, geometric
residual connection. Through analysis, we have demonstrated that its core component, the Delta
Operator, unifies identity mapping, projection, and reflection into a single, continuously differentiable
module. This unification is controlled by a simple learned scalar gate, which dynamically shapes
the spectrum of the layer-to-layer transition operator. By empowering the network to learn
transformations with negative eigenvalues in a data-dependent fashion, DDL offers a significant
and principled increase in expressive power while retaining the foundational benefits of the residual
learning paradigm.
We have introduced Deep Delta Learning, a novel architecture built upon an adaptive, geometric
residual connection. Through analysis, we have demonstrated that its core component, the Delta
Operator, unifies identity mapping, projection, and reflection into a single, continuously differentiable
module. This unification is controlled by a simple learned scalar gate, which dynamically shapes
the spectrum of the layer-to-layer transition operator. By empowering the network to learn
transformations with negative eigenvalues in a data-dependent fashion, DDL offers a significant
and principled increase in expressive power while retaining the foundational benefits of the residual
learning paradigm.
https://github.com/yifanzhang-pro/deep-delta-learning/blob/master/Deep_Delta_Learning.pdf