Neural Alpha-Beta Chess Engine

Experience how neural networks can enhance classical game-playing algorithms like Alpha-Beta pruning in chess through this interactive demonstration.

Game AI Decision Trees Neural Networks Interactive Learning

Mathematical Framework & Algorithm Description

1. Classical Alpha-Beta Pruning

The classical alpha-beta algorithm is a minimax search with pruning that explores the game tree to find optimal moves. It maintains two bounds: α (lower bound for maximizing player) and β (upper bound for minimizing player).

Minimax Value Function:

$$V(s, d) = \begin{cases} \text{eval}(s) & \text{if } d = 0 \text{ or } s \text{ is terminal} \\ \max_{a \in A(s)} V(\text{result}(s,a), d-1) & \text{if player}(s) = \text{MAX} \\ \min_{a \in A(s)} V(\text{result}(s,a), d-1) & \text{if player}(s) = \text{MIN} \end{cases}$$

Where s is the game state, d is the remaining search depth, A(s) is the set of legal actions, and result(s,a) is the state after applying action a to state s.

Alpha-Beta Pruning Conditions:

$$\text{Prune if: } \alpha \geq \beta$$

Where α is updated as: $\alpha = \max(\alpha, v)$ for MAX nodes, and β is updated as: $\beta = \min(\beta, v)$ for MIN nodes.

2. Neural/Soft Alpha-Beta Framework

The soft variant replaces hard max/min operations with differentiable approximations, enabling gradient-based optimization and neural network integration. This approach is inspired by AlphaZero-style architectures.

Soft Max/Min Operations:

$$\text{softmax}(a, b; \tau) = \frac{a \cdot e^{a/\tau} + b \cdot e^{b/\tau}}{e^{a/\tau} + e^{b/\tau}}$$
$$\text{softmin}(a, b; \tau) = -\text{softmax}(-a, -b; \tau)$$

The temperature parameter τ controls the softness: as $\tau \to 0$, softmax approaches hard max operation.

Soft Pruning Gates:

$$g_{\text{MAX}}(\tilde{v}, \beta) = \sigma\left(\frac{\beta - \tilde{v}}{\tau_g}\right)$$
$$g_{\text{MIN}}(\tilde{v}, \alpha) = \sigma\left(\frac{\tilde{v} - \alpha}{\tau_g}\right)$$

Where $\sigma(x) = \frac{1}{1 + e^{-x}}$ is the sigmoid function, and $\tau_g$ controls gate sensitivity. The gates provide soft pruning by dampening contributions from "pruned" branches.

Soft Search Update:

$$v_{\text{new}} = g \cdot \tilde{v} + (1-g) \cdot v_{\text{old}}$$

Where $\tilde{v}$ is the soft max/min result and $g$ is the pruning gate value.

3. Game State Evaluation & Terminal Conditions

Position Evaluation Function:

$$\text{eval}(s) = \phi \cdot E_{\text{mg}}(s) + (1-\phi) \cdot E_{\text{eg}}(s)$$

Tapered evaluation combining middle-game ($E_{\text{mg}}$) and end-game ($E_{\text{eg}}$) assessments, where $\phi \in [0,1]$ is the game phase factor based on remaining material.

Material and Positional Evaluation:

$$E(s) = \sum_{p \in \text{pieces}} \text{sign}(p) \cdot \left[V(p) + \text{PST}(p, \text{pos}(p))\right]$$

Where $V(p)$ is the material value of piece $p$, $\text{PST}(p, \text{pos}(p))$ is the piece-square table value, and $\text{sign}(p) = \pm 1$ based on piece color.

Terminal State Rewards:

$$R_{\text{terminal}}(s) = \begin{cases} \pm(100000 - d) & \text{if checkmate at depth } d \\ 0 & \text{if stalemate or draw} \\ \text{eval}(s) & \text{if depth limit reached} \end{cases}$$

Checkmate rewards decrease with search depth to prefer faster mates. All draws (stalemate, 50-move rule, insufficient material, threefold repetition) receive neutral reward.

4. Search Enhancements

Zobrist Hashing for Transposition Tables:

$$h(s) = \bigoplus_{i} Z_{\text{piece}[i]}[\text{square}[i]] \oplus Z_{\text{flags}}$$

Where $\bigoplus$ is the XOR operation, $Z$ are random 64-bit numbers, and flags include castling rights, en passant, and side to move. Enables $O(1)$ position lookup and repetition detection.

Move Ordering (MVV-LVA):

$$\text{priority}(m) = 10000 \cdot \mathbb{1}_{\text{capture}}(m) + V(\text{captured}) - V(\text{attacker})$$

Most Valuable Victim - Least Valuable Attacker ordering prioritizes capturing high-value pieces with low-value pieces, improving alpha-beta cutoff efficiency.

Quiescence Search:

$$Q(s, \alpha, \beta) = \begin{cases} \max(\text{eval}(s), \max_{c \in \text{captures}(s)} Q(\text{result}(s,c), \alpha, \beta)) & \text{if MAX} \\ \min(\text{eval}(s), \min_{c \in \text{captures}(s)} Q(\text{result}(s,c), \alpha, \beta)) & \text{if MIN} \end{cases}$$

Extends search in tactical positions by considering only captures and checks until a "quiet" position is reached, reducing horizon effects and improving tactical accuracy.

5. Probabilistic Move Selection (Soft Mode)

Move Choice Distribution:

$$P(a_i) = \frac{\exp(c_i / \tau_{\text{choice}})}{\sum_j \exp(c_j / \tau_{\text{choice}})}$$

Where $c_i$ is the contribution of move $a_i$ from the soft search, and $\tau_{\text{choice}}$ controls exploration vs exploitation. Lower temperatures lead to more deterministic play.

Contribution Accumulation:

$$c_i = \prod_{j=1}^{i} g_j \cdot v_i$$

Move contributions are weighted by the cumulative product of pruning gates, giving higher weight to moves explored more thoroughly.

6. Implementation Details

Key Features:

  • Iterative Deepening: Searches progressively deeper depths (1, 2, ..., d) to improve move ordering and enable time management
  • Transposition Tables: Cache search results using Zobrist hashing to avoid redundant computations
  • Complete Rule Implementation: Handles all chess rules including castling, en passant, promotions, and draw conditions
  • Dual Mode Operation: Toggle between classical hard alpha-beta and differentiable soft variant
  • Game State Detection: Comprehensive checkmate, stalemate, and draw detection (50-move rule, threefold repetition, insufficient material)
  • Parameter Tuning: Interactive adjustment of temperature parameters (τ, τ_g, τ_choice) in soft mode

Research Applications:

This implementation demonstrates how classical game-playing algorithms can be enhanced with neural network techniques, providing a foundation for exploring differentiable game tree search, policy gradient methods, and hybrid AI architectures that combine symbolic reasoning with deep learning.