Vision-Language-Action (VLA) models have emerged as a promising paradigm for robot learning, but the