Sitong Wu | 吴思彤  

PhD (2nd Year)

Department of Computer Science and Engineering
The Chinese University of Hong Kong (CUHK)

Email: stone-wu@link.cuhk.edu.hk


Biography

I am currently a 1st year PhD student at Computer Science & Engineering Department, The Chinese University of Hong Kong (CUHK), under the supervision of Prof. Jiaya Jia and Prof. Bei Yu. Before that, I got the Bachelor degree in Information Engineering at Jilin University (JLU).

My recent research revolves around efficient AI (including data, model, training and inference efficiency), particularly exploring its applications in large language model (LLM) and multi-modal large language model (MLLM).

Selected Publications [Google Scholar]

*: equal contribution
SaCo Loss: Sample-wise Affinity Consistency for Vision-Language Pre-training
Sitong Wu*, Haoru Tan*, Zhuotao Tian, Yukang Chen, Xiaojuan Qi, Jiaya Jia
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024
paper
Data Pruning via Moving-one-Sample-out
Haoru Tan*, Sitong Wu*, Fei Du, Yukang Chen, Zhibin Wang, Fan Wang, Xiaojuan Qi
Conference on Neural Information Processing Systems (NeurIPS), 2023
arXiv code
Pale Transformer: A General Vision Transformer Backbone with Pale-Shaped Attention
Sitong Wu, Tianyi Wu, Haoru Tan, Guodong Guo
Association for the Advancement of Artificial Intelligence (AAAI), 2022, Oral
paper arXiv code
Semantic Diffusion Network for Semantic Segmentation
Haoru Tan*, Sitong Wu*, Jimin Pi
Conference on Neural Information Processing Systems (NeurIPS), 2022
paper arXiv code
Demystify Transformers & Convolutions in Modern Image Deep Networks
Xiaowei Hu*, Min Shi*, Weiyun Wang*, Sitong Wu*, Linjie Xing, Wenhai Wang, Xizhou Zhu, Lewei Lu, Jie Zhou, Xiaogang Wang, Yu Qiao, Jifeng Dai
arXiv: 2211.05781
arXiv code
Fully Transformer Networks for Semantic Image Segmentation
Sitong Wu*, Tianyi Wu*, Fangjian Lin*, Shengwei Tian, Guodong Guo
arXiv: 2106.04108
arXiv code
Full-scale Selective Transformer for Semantic Segmentation
Fangjian Lin*, Sitong Wu*, Yizhe Ma, Shengwei Tian
Asian Conference on Computer Vision (ACCV), 2022
paper
RegionBLIP: A Unified Multi-modal Pre-training Framework for Holistic and Regional Comprehension
Qiang Zhou, Chaohui Yu, Shaofeng Zhang, Sitong Wu, Zhibin Wang, Fan Wang
arXiv: 2308.02299
arXiv code
Ensemble Quadratic Assignment Network for Graph Matching
Haoru Tan, Chuang Wang, Sitong Wu, Xuyao Zhang, Fei Yin, Chenglin Liu
International Journal on Computer Vision (IJCV), 2024
arXiv
Proxy Graph Matching with Proximal Matching Networks
Haoru Tan, Chuang Wang, Sitong Wu, Tieqiang Wang, Xuyao Zhang, Chenglin Liu
Association for the Advancement of Artificial Intelligence (AAAI), 2021
paper
UniNeXt: Exploring A Unified Architecture for Vision Recognition
Fangjian Lin, Jianlong Yuan, Sitong Wu, Fan Wang, Zhibin Wang
ACM Multimedia Conference (ACM MM), 2023
paper arXiv code
StructToken: Rethinking Semantic Segmentation with Structural Prior
Fangjian Lin*, Zhanhao Liang*, Sitong Wu, Junjun He, Kai Chen, Shengwei Tian
IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 2023
paper arXiv code
PRSeg: A Lightweight Patch Rotate MLP Decoder for Semantic Segmentation
Fangjian Lin*, Yizhe Ma*, Sitong Wu, Long Yu, Shengwei Tian
IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 2023
paper arXiv
CATrans: Context and Affnity Transformer for Few-Shot Segmentation
Shan Zhang, Tianyi Wu, Sitong Wu, Guodong Guo
International Joint Conference on Artificial Intelligence (IJCAI), 2022
paper arXiv
AxWin Transformer: A Context-Aware Vision Transformer Backbone with Axial Windows
Fangjian Lin, Yizhe Ma, Sitong Wu, Long Yu, Shengwei Tian
arXiv: 2305.01280
arXiv

Experiences

Honors and Awards