Redirecting you to Reddit in 5 seconds:
https://old.reddit.com/r/singularity/comments/19dzmul/spatialvlm_endowing_visionlanguage_models_with