Forgetting is removing parameters from a neural network to change how they behave. This blog is not really an introduction to the forgetting, it is an intro to the blog series that I am going to be working on while I develop the idea and methods. My last series of blogs has taken too long because I tried to plan them out, but had to keep doing more experiments. Just a time consuming, and poorly planned way of doing them. Here I will be a bit looser with putting up blogs. Experiments that are small in scope, and quick discussions in between. Hopefully this will mean faster work from me. The trade off is that I might have to repeat myself a bit, and will make mistakes along the way. It might be a cynical way to get people to come back wo the website. "Hey why don’t you come back and see if I fixed that mistake from a few weeks ago.". I will be developing concepts and methods along the way. I will summarise this at the end, but the process might be interesting.
I can use my software to remove quite a lot of parameters without really making any difference to the performance of a model. This is not new, pruning been shown to work since forever. A relatively small number of parameters are doing all the heavy lifting. If I remove these important parameters the model fails very quickly. I am going to see if there are a small number of parameters that I can remove that will change the behaiviour of the model without causing it to fail.
I am going to start with blog about a simple experiment. Trying to get a fully connected neural network trained on MNIST to forget 1 output class by pruning parameters. There are a few things that need to be considered before going into too much depth. I will aim to prune as few parameters as possible, and will not prune the first or last layer. These things will be easier to explain after an experiment showing that this works.
I will try this experiment later with a couple of different methods for evaluating parameters. Before then I will need to understand what is going on when internally when the model is pruned in this way. I will look at outputs and activations. Between blogs dedicated to the experiments, methods, and model behaviour, I will talk about what I think is happening. I will introduce some concepts that help me understand things. I will try and keep the speculation and half developed concepts in seperate blogs from those talking about methods and results.
I have to be honest, I don’t know what anyone else is doing in this field. This is my hobby, I have another job, I don’t really have time. If there is something you think is relevent let me know. If there is something similar and I have not cited it, I have not read it, but I might find it interesting, so let me know. I am publishing this hoping to get more engaged with what else is going on in this field.
I think that this is all that I need to cover in an introduction, so come on and join me for an adventure in stream of consciousness science.
Next
This is the one, Second blog, first experiment.