vision-x-nyu.github.io

Thinking in space: how multimodal LLMs see, remember and recall

How emerging spatial reasoning and local world modeling capabilities remain subhuman but promising.