Chuhan Zhang (张楚晗)
Senior Research Scientist @ Google DeepMind
Email · Google Scholar · GitHub · X / Twitter
I am a researcher working on multimodal AI, with interests spanning video understanding, dynamic 3D scene reconstruction, and automatic evaluation pipelines for generative models. I am particularly keen on building models that are simple in design yet achieve strong performance. My recent work focuses on video spatial understanding and situated awareness in Gemini.
I completed my PhD at the Visual Geometry Group (VGG), University of Oxford, advised by Andrew Zisserman. Prior to my PhD, I obtained my MEng in Engineering Science at Exeter College, University of Oxford. Please feel free to reach out by email if you'd like to discuss research or collaborate.
Selected Publications
-
Efficiently Reconstructing Dynamic Scenes One D4RT at a TimeIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026 Oral Award Candidate
-
Scaling 4D RepresentationsarXiv preprint, 2024
-
Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human RatingsInternational Conference on Learning Representations (ICLR), 2025 Spotlight
-
Helping Hands: An Object-Aware Ego-Centric Video Recognition ModelInternational Conference on Computer Vision (ICCV), 2023
-
Is an Object-Centric Video Representation Beneficial for Transfer?Asian Conference on Computer Vision (ACCV), 2022
-
Temporal Query Networks for Fine-grained Video UnderstandingIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021 Oral
-
Adaptive Text Recognition through Visual MatchingEuropean Conference on Computer Vision (ECCV), 2020