The architecture of Physical Intelligence's models is a vision-language model (VLM) combined with..., Sonic AI
“The architecture of Physical Intelligence's models is a vision-language model (VLM) combined with a separate "action model" component that translates instructions and visual input into robot commands.”