HBM Archives - EITCA Academy

In TPU v1, quantify the effect of FP32→int8 with per-channel vs per-tensor quantization and histogram vs MSE calibration on performance/watt, E2E latency, and accuracy, considering HBM, MXU tiling, and rescaling overhead.

Thursday, 04 December 2025 by JOSE ALFONSIN PENA

The effect of quantization approaches—specifically FP32 to int8 with per-channel versus per-tensor schemes and histogram versus mean squared error (MSE) calibration—on Google TPU v1 performance and accuracy is multifaceted. The interplay among quantization granularity, calibration techniques, hardware tiling, memory bandwidth, and overheads such as rescaling must be comprehensively analyzed to understand their influence on performance

Published in Artificial Intelligence, EITC/AI/GCML Google Cloud Machine Learning, Expertise in Machine Learning, Tensor Processing Units - history and hardware

Tagged under: Accuracy, Artificial Intelligence, Calibration, HBM, Int8, Latency, MXU, Performance, Quantization, TPU

EITCA Academy

LOG IN TO YOUR ACCOUNT

FORGOT YOUR PASSWORD?

CREATE AN ACCOUNT

In TPU v1, quantify the effect of FP32→int8 with per-channel vs per-tensor quantization and histogram vs MSE calibration on performance/watt, E2E latency, and accuracy, considering HBM, MXU tiling, and rescaling overhead.