“Anthropic built-in "river crossing problems" into its system prompts to prevent its models from making embarrassing reasoning errors.”