Cognitive Model of Two Choice Task

We used reinforcement learning within the system.
One of the problems is that if a system chooses a bad rule, the neurons that make the choice and are the rulre co-fire.
Via Hebbian learning these are then strengthened.
The explore subnet stimulates the choice of action (a). By itself (if the precondition (s) is valid), it will randomly choose between actions.
Without it, the current action will persist.
The value net inhibits explore, so if there's a reward it stops it, and the good result is reinforced.
This works for verb learning (see CABots), and the two choice task cognitive model.
There is a line with slope 1 which is the input, showing how likely you are to get a reward. People do the solid line, our model the dashed.