“In large-scale AI supercomputers with tens of thousands of chips, the failure of a single chip can halt the entire computation.”