dimension one
machine learning phenomena through minimalist examples
-
minimal sharpness and l2 norm at stable learning rates, edge of stability at the largest ones
for L(x,y)=(1-xy)^2/2, sharpness on xy=1 is also the squared l2 norm. GD empirically converges near its minimum for all tested stable learning rates. This coincides with edge of stability behavior only near the largest stable learning rates.
-
a phase portrait of gradient descent on a 1-hidden linear neuron
for the simple factorized loss (1-xy)^2/2, GD cannot converge to xy=1 without oscillating across it beyond learning rate 1/2, and cannot converge at all beyond learning rate 1
-
proxies for generalization: what survives in 1d when generalization gap is explicit?
in a 1d classification example where the generalization gap is explicit, the failures of common generalization proxies can be understood geometrically: most track sensitivity rather than correctness of the classifier
-
at interpolation, generalization is distance to Bayes
in the same setup as the previous post, the generalization gap turns out to have a simple analytical form: Bayes error plus a disagreement term, computable in one dimension
-
same model, same optimizer -- different generalization
a minimalist reproduction of why generalization cannot be understood without looking at the data