“Harrison Chase assesses that current language models are not yet proficient enough at using web browsers for it to be a reliable tool for agents.”