🧠 Neural ODE - Interactive Machine Learning Demo

Training Epochs

Number of training iterations

ODE Solver

Method used for forward prediction (training uses Euler)

Training Solver Mode

Use selected solver (RK4) for training

Enables RK4 unrolling + adjoint. May be slower.

Adjoint Mode

Continuous adjoint (memory-light, experimental)

Integrates augmented ODE backward instead of unrolling.

Learning Rate

Step size for gradient descent

Hidden Layer Size

Number of neurons in hidden layer

Time Step (dt)

Integration step size

Training Trajectories

Number of training sequences

Trajectory Length

Time steps per trajectory

Test Extension (%)

Extend test beyond training

Spiral Decay Rate

How fast the spiral shrinks

Oscillator Frequency

Frequency of oscillation

Pendulum Length

Length of pendulum

Lorenz Parameters

Sigma, Rho, Beta parameters

Double Pendulum Lengths

Length of first and second pendulum arms

Quadrotor Parameters

Mass and arm length

UI Update Frequency

Epochs between UI updates

Observation Stride

Compute loss only every N steps (1 = all timesteps)

Introduction to Neural Ordinary Differential Equations

Standard Recurrent Neural Networks (RNNs) and their variants, such as LSTMs and GRUs, operate on discrete sequences of data. They update their hidden state at fixed intervals, which can be restrictive for modeling continuous-time processes or data with irregular time steps. Neural Ordinary Differential Equations (Neural ODEs) propose a paradigm shift: instead of defining a discrete transition function, we model the continuous-time dynamics of a system's hidden state, $\mathbf{z}(t)$, using a neural network.

The core idea is to parameterize the derivative of the hidden state with respect to time, $\frac{d\mathbf{z}(t)}{dt}$, using a neural network, denoted as $f_\theta$. This transforms the problem of learning a sequence-to-sequence mapping into learning the vector field of a dynamical system.

Mathematical Formulation

Let the state of a system at time $t$ be represented by a vector $\mathbf{z}(t) \in \mathbb{R}^D$. A Neural ODE defines the dynamics of this state via the following initial value problem (IVP): $$ \frac{d\mathbf{z}(t)}{dt} = f_\theta(\mathbf{z}(t), t) $$ where $f_\theta$ is a neural network with parameters $\theta$. Given an initial state $\mathbf{z}(t_0)$, the state at any later time $t_1$ can be found by solving this ODE: $$ \mathbf{z}(t_1) = \mathbf{z}(t_0) + \int_{t_0}^{t_1} f_\theta(\mathbf{z}(t), t) dt $$ This integration is performed by a numerical ODE solver, such as Euler's method or the more accurate Runge-Kutta methods (e.g., RK4). The entire process, from $\mathbf{z}(t_0)$ to $\mathbf{z}(t_1)$, can be viewed as a single, continuous-depth residual network layer, which we can call an ODESolve function.

Training via the Adjoint Method

Training the network—that is, finding the optimal parameters $\theta$—requires backpropagating through the ODE solver. A naive approach of unrolling the solver's operations and storing all intermediate states is computationally expensive, especially for long time horizons or high-precision solvers.

The Adjoint Method provides an elegant and memory-efficient solution. It avoids the need for backpropagation through the solver's internal steps. We define a loss function $L$ that depends on the final state, $L(\mathbf{z}(t_1))$. The key is to compute the gradient of the loss with respect to the parameters, $\frac{dL}{d\theta}$, without explicitly differentiating the ODESolve function.

The adjoint state, $\mathbf{a}(t) = \frac{dL}{d\mathbf{z}(t)}$, represents how the loss changes with respect to the hidden state at time $t$. Its dynamics are given by another ODE: $$ \frac{d\mathbf{a}(t)}{dt} = -\mathbf{a}(t)^T \frac{\partial f_\theta(\mathbf{z}(t), t)}{\partial \mathbf{z}} $$ This adjoint ODE is solved backward in time, from $t_1$ to $t_0$, starting with the initial condition $\mathbf{a}(t_1) = \frac{\partial L}{\partial \mathbf{z}(t_1)}$. The gradient of the loss with respect to the parameters $\theta$ can then be computed by evaluating a third integral, also backward in time: $$ \frac{dL}{d\theta} = \int_{t_1}^{t_0} \mathbf{a}(t)^T \frac{\partial f_\theta(\mathbf{z}(t), t)}{\partial \theta} dt $$ This method has a constant memory cost with respect to the number of integration steps, making it highly scalable. This interactive demo allows you to switch between standard backpropagation (unrolling) and the more efficient continuous adjoint method.

About This Interactive Demo

This tool allows you to explore the behavior of Neural ODEs on various dynamical systems.

Generate Data: Select a system type (e.g., Spiral, Pendulum) and click to generate ground-truth trajectories.
Train Model: Adjust hyperparameters like learning rate, network size, and the ODE solver. Then, train the model to learn the system's dynamics from the generated data.
Test Generalization: Evaluate the trained model's ability to predict trajectories beyond the training time horizon, a key test of its generalization capabilities.

By experimenting with the controls, you can build an intuition for how Neural ODEs work and how different settings affect their performance in learning complex dynamics.

🧠 Neural ODE: Learning Dynamical Systems

Select System Type

Training Loss

Epochs

Data Points

Test Error

Introduction to Neural Ordinary Differential Equations

Mathematical Formulation

Training via the Adjoint Method

About This Interactive Demo