dimension one

machine learning phenomena through minimalist examples

  • minimal sharpness and l2 norm at stable learning rates, edge of stability at the largest ones

    Jun 2026

    for L(x,y)=(1-xy)^2/2, sharpness on xy=1 is also the squared l2 norm. GD empirically converges near its minimum for all tested stable learning rates. This coincides with edge of stability behavior only near the largest stable learning rates.

  • a phase portrait of gradient descent on a 1-hidden linear neuron

    May 2026

    for the simple factorized loss (1-xy)^2/2, GD cannot converge to xy=1 without oscillating across it beyond learning rate 1/2, and cannot converge at all beyond learning rate 1

  • proxies for generalization: what survives in 1d when generalization gap is explicit?

    Apr 2026

    in a 1d classification example where the generalization gap is explicit, the failures of common generalization proxies can be understood geometrically: most track sensitivity rather than correctness of the classifier

  • at interpolation, generalization is distance to Bayes

    Mar 2026

    in the same setup as the previous post, the generalization gap turns out to have a simple analytical form: Bayes error plus a disagreement term, computable in one dimension

  • same model, same optimizer -- different generalization

    Feb 2026

    a minimalist reproduction of why generalization cannot be understood without looking at the data

about