dimension one
machine learning phenomena through minimalist examples
-
proxies for generalization: what survives in 1d when generalization gap is explicit?
in a 1d classification example where the generalization gap is explicit, the failures of common generalization proxies can be understood geometrically: most track sensitivity rather than correctness of the classifier; across runs, disagreement mass and decision-flip count are the most stable, while margins, slopes, and sharpness vary much more
-
at interpolation, generalization is distance to Bayes
in the same setup as the previous post, the generalization gap turns out to have a simple analytical form: Bayes error plus a disagreement term, computable in one dimension
-
same model, same optimizer -- different generalization
a minimalist reproduction of why generalization cannot be understood without looking at the data