“In an internal evaluation of a coding agent, a QIN3 base model demonstrated lower accuracy than GPT models.”