I'm writing up an algorithm for learning variable size finite state controllers. It is mostly an amalgamation of other people's work, but it should give me publishable results, a hence a paper, by Christmas. It's nice to be making some progress on the PhD after so long thrashing around trying to find a direction.
Thursday, June 30, 2005
This post is a constructive proof that blogs are just made of cat photos. This is the two surviving kittens being supervised by the resident big cat.
The white and grey guy is eating well and looking like he will recover, though he is currently 200g lighter than his brother. He doesn't think much of his brother beating him up, though this is the most fight he has shown in days, and so is a good thing in some ways.
Update: Bree insists both kits are black and white. She's a vet; she wins this one.
Wednesday, June 29, 2005
About 2 weeks ago Bree brought home some kittens that had been found behind a shop. They needed socialisation, and looking after; if they stayed at the RSPCA they'd probably get ill and the nurses wouldn't have enough time to give them the attention they need. As we took last week off to fix up things around the house it seemed like a good fit.
You can see them in that photo looking very adorable. Now, two of them are dead, and one is slowly recovering from illness. It has been harrowing. One night we stayed up to 3AM nursing one of them; an hour later it had died.
Friday, June 10, 2005
I'm working my way through Reinforcement Learning by Policy Search and I've come across one quirk in the definition of finite-state controllers. To me the natural definition is a straight forward extension of a reactive policy. A reactive policy maps an observation to an action. The straight forward extension to include an internal memory would be to map observation and current internal state to an action. Peshkin's definition maps current internal state to an action, ignoring the observation. He notes that this can emulate my scheme by increasing the number of internal states so you have one per 'real' memory state and observation pair. I'm not sure there is any particular advantage to either scheme, though his will have less parameters for the same number of internal states as mine. He does incorrectly state that in his scheme the degenerate case with a single internal state is a reactive policy. That is not so, though it is in my scheme.