@lain I think it may be because the LLM is giving itself more context to correctly predict the next token, predicting the correct next step of a math problem should only need info about the previous steps, whereas properly predicting the tokens of the correct answer with only the initial statement may be more difficult