Results

CABot1 is pretty sophisticated so is a bit difficult to evaluate.
Moreover, each subnet has some randomness in generation, so it performs differently from run to run.
Unit testing shows parsing being successful over 99% of the time on 23 sentences on the best nets.
The whole system can also be evaluated.
There are four types of commands.
The simplest are direct commands (Turn left. Move forward). The best nets emit the correct commands over 90% of the time and the average is around 80%.
The compound commands (Go left. which means left then forward) are successful around 85% of the time on the best nets and 75% on the average.
The one step context sensitive commands (Turn toward the stalctite/pyramid.) are successful around 75% of the time with the best nets and 65% on average.
The mutli-step context senstive commands (Go to the pyramid.) are successful 50% in the best case and 35% on average (with some nets always failing).
The measurements for the context sensitive commands include failure conditions.
In short, it works, but is far from perfect.