Mean Activated Absolute Values

My software library is called Sparana, or sparse parameter analysis. The sparse part is because I wanted to use sparse data structures. The parameter analysis is because I wanted to look at parameter and what they were doing. Well if I am going to call my software library that, i should probably have a new way of analysing parameters. Well I have one. I was planning on developing others, but this one works pretty well for now. (I do have other things that justify the name, just not as well tested as this.)

MAAV is mean activated absolute value. I will absolutely get the 2 As there backwards sometimes, just a heads up, it means the same thing. To calculate this I get a subset of training datapoints, and pass them through the network. As each weight is multiplied by an input it gives me the activated value. I take the absolute value of each individual activated value, and then take the mean of these.

The steps

1. Initialize zero matrices for each weight matrix, I call this a stats matrix.

2. Multiply the input of one datapoing by the corresponding weight in the initial matrix.

3. Take the absolute value of this matrix and add to the first stats matrix.

4. Take the the matmul of this datapoint, add the bias and apply the activation function. Normal neural network things.

5. Multiply the output of layer one by the corresponding weight in the second matrix

6. Take the absolute value of this matrix and add to stats matrix 2.

7. Continue these steps through the whole model.

8. Repeat for as many test set datapoints as it takes to makes you feel good, or until just before your GPU overheats because it is 35 fucking degrees again.

9. Divide the stats matrices by the number of datapoints you used.

This can be quite slow because each datapoint needs to be passed through the system individually. Training is fast because it uses matrix multiplication optimized for large batches. I need the multiplied values before they are summed, which is what matmul does. Step 2 multiplies, then I have to use that data… slow, and step 4 does a matmul… fast.

I started using this method because I thought that it would be a good way of finding the parameters that are being used most by the model. The only other method I had read about at the time, and the method that I compare this to is absolute values. My reasoning was some weights might have a large value, but mostly get 0, or very small inputs, and some migh be small, but consistently get large inputs. Weight values that are initialized to have large vales, but have no influence on the output will not be made smaller by backpropogation. They will just sit there being big, doing nothing.

The downside of this is that weights that are only occasionally used will be removed in pruining, but might be useful in a small number of cases. We want the model to work in those cases. I could design something to deal with this, but for now MAAV is automated and uniform, and works well in a lot of cses. The code for this can be found in the lobotomizer.py file. I don’t have a dedicated notebook demonstrating how to use this, because most of them use it.

The real advantage of this method might be with passing through different data to find which parameters are associated with different behaviour. See the forgetting blogs for more on this.