“The use of external tools is a critical component for improving the effectiveness of test-time scaling in reasoning models.”