Linear Quadratic Models

The models that I use have a linear final layer, and are trained using a quadratic cost function. For this reason I call them linear quadratic(LQ) models, to differentiate them from softmax models trained with a cross entropy cost function(SCE).

Some of my first experiments were on adversarial attacks, tricking models with images corrupted with noise. I found that LQ models were harder to trick than SCE models. It was harder to generate noise that tricked models, and when you did the outputs looked different enough that corrupted data could be easily detected. I have probably lost the blogs, and I don’t think anyone cares anymore.

I consistently get higher test accuracy with models trained on MNIST and FMNIST using LQ models, in the order of 0.5%. Many of my other experiments show that LQ and SCE models behave differently, not always better or worse, just with different trade offs.

The training of these models pushes the outputs to different values, very close either 0 or 1 for LQ , and bigger values(-4 to 13 in one case I looked at) before the softmax is applied for SCE. It is wasy to understand how this could cause different behaviour. If the methods that I write about don’t work for you this could be the reason.