Implement Switching Between Learning and Inference-Focused Policies
Currently, a Monty system cannot flexibly switch between a learning-focused policy (such as the naive scan policy) and an inference-focused policy. Enabling LMs to guide such a switch based on their internal models, and whether they are in a matching or exploration state, would be a useful improvement.
This would be a specific example of a more general mechanism for switching between different policies, as discussed in Switching Policies via Goal States.
Similarly, an LM should be able to determine the most appropriate model-based policies to initialize, such as the hypothesis-testing policy vs. a top-down exploration policy.
Updated 8 days ago