Yongshuo Zong

yzong.jpg

EH8 9AB

Edinburgh, UK

I am currently a PhD student at the University of Edinburgh, supervised by Prof. Timothy Hospedales and Dr. Yongxin Yang, where I am funded by UKRI CDT in Biomedical AI. I obtained my BSc in computer science from Tongji University, in 2021.

I am broadly interested in machine learning and its applications in healthcare, especially with multi-modal learning and large vision-language models.

Feel free to drop me an email for potential collaborations!

news

Sep 03, 2024 Start my internship at Amazon AWS AI!
Jul 11, 2024 Survey on Self-supervised Multimodal Learning is accepted to IEEE T-PAMI!
May 01, 2024 Both VLGuard and Fool your (V)LLMs are accepted to ICML’24!
Apr 24, 2024 Giving a talk about VLGuard at BMVA Trustworthy Multimodal Foundation Models Symposium!
Feb 27, 2024 C-VQA is accepted to CVPR’24!
Jan 17, 2024 Giving a talk about Fool your (V)LLMs at BMVA Vision-Language Symposium!
Nov 06, 2023 Invited talk about MEDFAIR at FAIMI workshop!
Feb 27, 2023 Meta-Omnium is accepted to CVPR’23!
Jan 21, 2023 MEDFAIR is accepted to ICLR’23 as spotlight!

selected publications/preprints

  1. zong2024safety.png
    Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models
    Yongshuo Zong, Ondrej Bohdal, Tingyang Yu, Yongxin Yang, and Timothy Hospedales
    ICML, 2024
    TL;DR: VLLM fine-tuning breaks LLM safety, but our VLGuard can fix this.
  2. zong2024fool.png
    Fool your (vision and) language model with embarrassingly simple permutations
    Yongshuo Zong, Tingyang Yu, Bingchen Zhao, Ruchika Chavhan, and Timothy Hospedales
    ICML, 2024
    TL;DR: (V)LLM-based MCQ is not permutation robust.
  3. zhang2024if.png
    What if the tv was off? examining counterfactual reasoning abilities of multi-modal language models
    Letian Zhang, Xiaotong Zhai, Zhongkai Zhao, Yongshuo Zong, Xin Wen, and 1 more author
    CVPR, 2024
    TL;DR: Vision large language models do not understand counterfactual conditions well.
  4. bohdal2023meta.png
    Meta omnium: A benchmark for general-purpose learning-to-learn
    Ondrej Bohdal, Yinbing Tian, Yongshuo Zong, Ruchika Chavhan, Da Li, and 3 more authors
    CVPR, 2023
    TL;DR: A framework for evaluating meta-learners across various vision tasks consistently.
  5. zong2023medfair.png
    MEDFAIR: benchmarking fairness for medical imaging
    Yongshuo Zong, Yongxin Yang, and Timothy Hospedales
    ICLR, 2023
    TL;DR: We develop a fairness benchmark for medical imaging and find that the state-of-the-art bias mitigation algorithm does not significantly outperform ERM.
  6. zong2024vl.png
    VL-ICL Bench: The Devil in the Details of Benchmarking Multimodal In-Context Learning
    Yongshuo Zong, Ondrej Bohdal, and Timothy Hospedales
    arXiv preprint, 2024
    TL;DR: VL-ICL Bench is a better multimodal ICL benchmark than VQA and captioning.
  7. zong2023self.png
    Self-supervised multimodal learning: A survey
    Yongshuo Zong, Oisin Mac Aodha, and Timothy Hospedales
    IEEE T-PAMI, 2024
    TL;DR: Systematic review of self-supervised multimodal learning methods.
  8. zong2022const.png
    conST: an interpretable multi-modal contrastive learning framework for spatial transcriptomics
    Yongshuo Zong, Tingyang Yu, Xuesong Wang, Yixuan Wang, Zhihang Hu, and 1 more author
    BioRxiv preprint, 2022
    TL;DR: A contrastive SSL method for spatial transcriptomics representation learning.