Yongshuo Zong

EH8 9AB

Edinburgh, UK

I am currently a PhD student at the University of Edinburgh, supervised by Prof. Timothy Hospedales and Dr. Yongxin Yang, where I am funded by UKRI CDT in Biomedical AI. I obtained my BSc in computer science from Tongji University, in 2021.

I am broadly interested in machine learning and its applications in healthcare, especially with multi-modal learning and large vision-language models. Feel free to drop me an email for potential collaborations!

I will be graduating in Fall 2025 and am seeking a research scientist position. Feel free to reach out if you think I’d be a good fit!

news

Feb 26, 2025	One paper from my internship at Amazon is accepted to CVPR’25!
Jan 22, 2025	VL-ICL is accepted to ICLR’25!
Sep 03, 2024	Start my internship at Amazon AWS AI!
Jul 11, 2024	Survey on Self-supervised Multimodal Learning is accepted to IEEE T-PAMI!
May 01, 2024	Both VLGuard and Fool your (V)LLMs are accepted to ICML’24!

selected publications/preprints

Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models

Yongshuo Zong, Ondrej Bohdal, Tingyang Yu, Yongxin Yang, and Timothy Hospedales

ICML, 2024

TL;DR: VLLM fine-tuning breaks LLM safety, but our VLGuard can fix this.

arXiv Code Poster Website
Fool your (vision and) language model with embarrassingly simple permutations

Yongshuo Zong, Tingyang Yu, Bingchen Zhao, Ruchika Chavhan, and Timothy Hospedales

ICML, 2024

TL;DR: (V)LLM-based MCQ is not permutation robust.

arXiv Code Poster
What if the tv was off? examining counterfactual reasoning abilities of multi-modal language models

Letian Zhang, Xiaotong Zhai, Zhongkai Zhao, Yongshuo Zong, Xin Wen, and 1 more author

CVPR, 2024

TL;DR: Vision large language models do not understand counterfactual conditions well.

arXiv Code Website
Meta omnium: A benchmark for general-purpose learning-to-learn

Ondrej Bohdal, Yinbing Tian, Yongshuo Zong, Ruchika Chavhan, Da Li, and 3 more authors

CVPR, 2023

TL;DR: A framework for evaluating meta-learners across various vision tasks consistently.

arXiv Code Website
MEDFAIR: benchmarking fairness for medical imaging

Yongshuo Zong, Yongxin Yang, and Timothy Hospedales

ICLR, 2023

TL;DR: We develop a fairness benchmark for medical imaging and find that the state-of-the-art bias mitigation algorithm does not significantly outperform ERM.

arXiv Code Poster Website
VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning

Yongshuo Zong, Ondrej Bohdal, and Timothy Hospedales

ICLR, 2025

TL;DR: VL-ICL Bench is a better multimodal ICL benchmark than VQA and captioning.

arXiv Code Website
Self-supervised multimodal learning: A survey

Yongshuo Zong, Oisin Mac Aodha, and Timothy Hospedales

IEEE T-PAMI, 2024

TL;DR: Systematic review of self-supervised multimodal learning methods.

arXiv Code
conST: an interpretable multi-modal contrastive learning framework for spatial transcriptomics

Yongshuo Zong, Tingyang Yu, Xuesong Wang, Yixuan Wang, Zhihang Hu, and 1 more author

BioRxiv preprint, 2022

TL;DR: A contrastive SSL method for spatial transcriptomics representation learning.

HTML Code