I'm working my way through Reinforcement Learning by Policy Search and I've come across one quirk in the definition of finite-state controllers. To me the natural definition is a straight forward extension of a reactive policy. A reactive policy maps an observation to an action. The straight forward extension to include an internal memory would be to map observation and current internal state to an action. Peshkin's definition maps current internal state to an action, ignoring the observation. He notes that this can emulate my scheme by increasing the number of internal states so you have one per 'real' memory state and observation pair. I'm not sure there is any particular advantage to either scheme, though his will have less parameters for the same number of internal states as mine. He does incorrectly state that in his scheme the degenerate case with a single internal state is a reactive policy. That is not so, though it is in my scheme.