Chuhan Zhang (张楚晗)

Senior Research Scientist @ Google DeepMind

Email · Google Scholar · GitHub · X / Twitter

I am a researcher working on multimodal AI, with interests spanning video understanding, dynamic 3D scene reconstruction, and automatic evaluation pipelines for generative models. I am particularly keen on building models that are simple in design yet achieve strong performance. My recent work focuses on video spatial understanding and situated awareness in Gemini.

I completed my PhD at the Visual Geometry Group (VGG), University of Oxford, advised by Andrew Zisserman. Prior to my PhD, I obtained my MEng in Engineering Science at Exeter College, University of Oxford. Please feel free to reach out by email if you'd like to discuss research or collaborate.

Selected Publications

Efficiently Reconstructing Dynamic Scenes One D4RT at a Time

Chuhan Zhang, Guillaume Le Moing, Skanda Koppula, Ignacio Rocco, …, Andrew Zisserman, Junlin Zhang, Mehdi S. M. Sajjadi

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026 Oral Best Paper Award

PDF arXiv Project Blog
Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings

Olivia Wiles^*, Chuhan Zhang^*, Isabela Albuquerque^*, …, Aida Nematzadeh
^*Equal contribution

International Conference on Learning Representations (ICLR), 2025 Spotlight

arXiv OpenReview
Scaling 4D Representations

João Carreira, Dilara Gokay, Michael King, Chuhan Zhang, Ignacio Rocco, …, Andrew Zisserman

arXiv preprint, 2024

arXiv Code
Helping Hands: An Object-Aware Ego-Centric Video Recognition Model

Chuhan Zhang, Ankush Gupta, Andrew Zisserman

International Conference on Computer Vision (ICCV), 2023

PDF Code
Is an Object-Centric Video Representation Beneficial for Transfer?

Chuhan Zhang, Ankush Gupta, Andrew Zisserman

Asian Conference on Computer Vision (ACCV), 2022

PDF arXiv
Temporal Query Networks for Fine-grained Video Understanding

Chuhan Zhang, Ankush Gupta, Andrew Zisserman

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021 Oral

PDF arXiv Code Project
Adaptive Text Recognition through Visual Matching

Chuhan Zhang, Ankush Gupta, Andrew Zisserman

European Conference on Computer Vision (ECCV), 2020

PDF Project Code