🧠 Neural ODE: Learning Dynamical Systems

Select System Type

Number of training iterations
Method used for forward prediction (training uses Euler)
Use selected solver (RK4) for training
Enables RK4 unrolling + adjoint. May be slower.
Continuous adjoint (memory-light, experimental)
Integrates augmented ODE backward instead of unrolling.
Step size for gradient descent
Number of neurons in hidden layer
Integration step size
Number of training sequences
Time steps per trajectory
Extend test beyond training
How fast the spiral shrinks
Epochs between UI updates
Compute loss only every N steps (1 = all timesteps)

Training Loss

-

Epochs

0

Data Points

0

Test Error

-

Introduction to Neural Ordinary Differential Equations

Standard Recurrent Neural Networks (RNNs) and their variants, such as LSTMs and GRUs, operate on discrete sequences of data. They update their hidden state at fixed intervals, which can be restrictive for modeling continuous-time processes or data with irregular time steps. Neural Ordinary Differential Equations (Neural ODEs) propose a paradigm shift: instead of defining a discrete transition function, we model the continuous-time dynamics of a system's hidden state, $\mathbf{z}(t)$, using a neural network.

The core idea is to parameterize the derivative of the hidden state with respect to time, $\frac{d\mathbf{z}(t)}{dt}$, using a neural network, denoted as $f_\theta$. This transforms the problem of learning a sequence-to-sequence mapping into learning the vector field of a dynamical system.

Mathematical Formulation

Let the state of a system at time $t$ be represented by a vector $\mathbf{z}(t) \in \mathbb{R}^D$. A Neural ODE defines the dynamics of this state via the following initial value problem (IVP): $$ \frac{d\mathbf{z}(t)}{dt} = f_\theta(\mathbf{z}(t), t) $$ where $f_\theta$ is a neural network with parameters $\theta$. Given an initial state $\mathbf{z}(t_0)$, the state at any later time $t_1$ can be found by solving this ODE: $$ \mathbf{z}(t_1) = \mathbf{z}(t_0) + \int_{t_0}^{t_1} f_\theta(\mathbf{z}(t), t) dt $$ This integration is performed by a numerical ODE solver, such as Euler's method or the more accurate Runge-Kutta methods (e.g., RK4). The entire process, from $\mathbf{z}(t_0)$ to $\mathbf{z}(t_1)$, can be viewed as a single, continuous-depth residual network layer, which we can call an ODESolve function.

Training via the Adjoint Method

Training the network—that is, finding the optimal parameters $\theta$—requires backpropagating through the ODE solver. A naive approach of unrolling the solver's operations and storing all intermediate states is computationally expensive, especially for long time horizons or high-precision solvers.

The Adjoint Method provides an elegant and memory-efficient solution. It avoids the need for backpropagation through the solver's internal steps. We define a loss function $L$ that depends on the final state, $L(\mathbf{z}(t_1))$. The key is to compute the gradient of the loss with respect to the parameters, $\frac{dL}{d\theta}$, without explicitly differentiating the ODESolve function.

The adjoint state, $\mathbf{a}(t) = \frac{dL}{d\mathbf{z}(t)}$, represents how the loss changes with respect to the hidden state at time $t$. Its dynamics are given by another ODE: $$ \frac{d\mathbf{a}(t)}{dt} = -\mathbf{a}(t)^T \frac{\partial f_\theta(\mathbf{z}(t), t)}{\partial \mathbf{z}} $$ This adjoint ODE is solved backward in time, from $t_1$ to $t_0$, starting with the initial condition $\mathbf{a}(t_1) = \frac{\partial L}{\partial \mathbf{z}(t_1)}$. The gradient of the loss with respect to the parameters $\theta$ can then be computed by evaluating a third integral, also backward in time: $$ \frac{dL}{d\theta} = \int_{t_1}^{t_0} \mathbf{a}(t)^T \frac{\partial f_\theta(\mathbf{z}(t), t)}{\partial \theta} dt $$ This method has a constant memory cost with respect to the number of integration steps, making it highly scalable. This interactive demo allows you to switch between standard backpropagation (unrolling) and the more efficient continuous adjoint method.

About This Interactive Demo

This tool allows you to explore the behavior of Neural ODEs on various dynamical systems.

  1. Generate Data: Select a system type (e.g., Spiral, Pendulum) and click to generate ground-truth trajectories.
  2. Train Model: Adjust hyperparameters like learning rate, network size, and the ODE solver. Then, train the model to learn the system's dynamics from the generated data.
  3. Test Generalization: Evaluate the trained model's ability to predict trajectories beyond the training time horizon, a key test of its generalization capabilities.
By experimenting with the controls, you can build an intuition for how Neural ODEs work and how different settings affect their performance in learning complex dynamics.