Implement Switching Between Learning and Inference-Focused Policies

Currently, a Monty system cannot flexibly switch between a learning-focused policy (such as the naive scan policy) and an inference-focused policy. Enabling LMs to guide such a switch based on their internal models, and whether they are in a matching or exploration state, would be a useful improvement.

This would be a specific example of a more general mechanism for switching between different policies, as discussed in Switching Policies via Goal States.

Similarly, an LM should be able to determine the most appropriate model-based policies to initialize, such as the hypothesis-testing policy vs. a model-based exploration policy.

Help Us Make This Page Better

All our docs are open-source. If something is wrong or unclear, submit a PR to fix it!

Make a Contribution

Learn how to contribute to our docs