👋About Me

I am a first year Statistics and Data Science PhD student at UCLA, advised by Prof. Kai-Wei Chang and Prof. Ying Nian Wu. Before that, I earned my M.S. in Data Science from Tsinghua University and B.S. in Electronic Engineering from Sun Yat-sen University (SYSU). Currently, I'm a research intern at Microsoft Research, working with Dr. Yeyun Gong, Dr. Yelong Shen, and Dr. Weizhu Chen. My research interests lie at the intersection of generative models, large language models (LLMs), and reinforcement learning (RL).

📖Education

Sep. 2025 - Jun. 2030 (Expected) Ph.D., Statistics and Data Science, University of California, Los Angeles, USA
Aug. 2022 - Jun. 2025 M.Sc., Data Science and Information Technology, Tsinghua University, Beijing, China.
GPA: 3.98/4.0, Top 3%
Sep. 2018 - Jun. 2022 B.Sc., Electronic Information Science and Technology, Sun Yat-sen University, Guangzhou, China.
GPA: 4.11/5.0, Top 3%

📑Selected Publications

	Beyond Pass@1: Self-Play With Variational Problem Synthesis Sustains RLVR Xiao Liang, Zhong-Zhi Li, Yeyun Gong, Yelong Shen, Ying Nian Wu, Zhijiang Guo, Weizhu Chen Preprint 2025, [Paper] [Code] [Page] We propose an online Self-play with Variational Problem Synthesis strategy for RLVR training that iteratively leverages model responses to synthesize variational problems for augmentation.
	SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning Xiao Liang, Zhong-Zhi Li, Yeyun Gong, Yang Wang, Hengyuan Zhang, Yelong Shen, Ying Nian Wu, Weizhu Chen NeurIPS 2025, [Paper] [Code] [Page] We introduce a Self-aware Weakness-driven problem Synthesis framework that identifies and leverages model weaknesses for problem augmentation in RLVR.
	TL; DR: Too Long, Do Re-weighting for Effcient LLM Reasoning Compression Zhong-Zhi Li, Xiao Liang, Zihao Tang, Lei Ji, Peijie Wang, Haotian Xu, Xing W, Haizhen Huang, Weiwei Deng, Yeyun Gong, Ying Nian Wu, Zhijiang Guo, Xiao Liu, Fei Yin, Cheng-Lin Liu Preprint 2025, [Paper] [Code] We propose a dynamic ratio-based training pipeline that balances the model’s System-1 and System-2 data to reduce redundant reasoning.
	Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs Xumeng Wen, Zihan Liu, Shun Zheng, Zhijian Xu, Shengyu Ye, Zhirong Wu, Xiao Liang, Yang Wang, Junjie Li, Ziming Miao, Jiang Bian, Mao Yang Preprint 2025,* [Paper] We introduce a stricter reasoning-aware metric than pass@k and a supporting theoretical foundation to show that RLVR uniquely incentivizes logically consistent reasoning and generalizes across tasks.
	Integrative Decoding: Improve Factuality via Implicit Self-consistency Yi Cheng, Xiao Liang, Yeyun Gong, Wen Xiao, Song Wang, Yuji Zhang, Wenjun Hou, Kaishuai Xu, Wenge Liu, Wenjie Li, Jian Jiao, Qi Chen, Peng Cheng, Wayne Xiong ICLR 2025, [Paper] This paper presents a self-consistency based decoding strategy for improving the factual accuracy of large language models, especially in long-form generation tasks.
	Task Oriented In-Domain Data Augmentation Xiao Liang* , Xinyu Hu, Simiao Zuo, Yeyun Gong, Qiang Lou, Yi Liu, Shao-Lun Huang, Jian Jiao EMNLP 2024,* [Paper] We propose a task-oriented in-domain data augmentation framework consisting of in-domain data selection and task-oriented synthetic passage generation.
	Chunk, Align, Select: A Simple Long-sequence Processing Method for Transformers Jiawen Xie, Pengyu Cheng, Xiao Liang, Yong Dai, Nan Du ACL 2024, [Paper] [Code] We propose a token selection framework for pre-trained transformers to process long sequences utilizing reinforcement learning.
	Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective Yiyao Yu, Yuxiang Zhang, Dongdong Zhang, Xiao Liang, Hengyuan Zhang, Xingxing Zhang, Ziyi Yang, Mahmoud Khademi, Hany Awadalla, Junjie Wang, Yujiu Yang, Furu Wei ACL 2025, [Paper] We introduce a framework that integrates Natural Language Reasoning, Algorithmic Reasoning, and Symbolic Reasoning to enable synergistic collaboration for LLMs.

(* indicates equal contribution)

🧑‍💻Experience

(Nov. 2023 - Aug. 2025) Reserch Intern, NLC Group, Microsoft Research Asia, Beijing, China.
Mentor: Yeyun Gong, Weizhu Chen
Working on LLM reasoning, pre-training and reinforcement learning.
(Mar. 2023 - Sep. 2023) Research Intern, AI Lab, Tencent Inc., Guangdong, China.
Mentor: Pengyu Cheng, Nan Du
Working on large language models, long sequence processing.

🏆Honors and Awards

Graduate Dean’s Scholar Award (GDSA, Amount: $14,500), UCLA, California, 2025
Outstanding Master Graduate Thesis, Tsinghua University, Beijing, 2025
Second Prize Scholarship, Tsinhua University, Beijing, 2024
Outstanding Graduate Thesis, Sun Yat-sen University, Guangdong, 2022
Outstanding Graduate Student, Sun Yat-sen University, Guangdong, 2022
Second Prize Scholarship, Sun Yat-sen University, Guangdong, 2019~2022
1st of the 2022 Tsinghua Open Hack Competition - Multimodal Learning Track, Tsinghua University, Beijing, 2022

Xiao Liang (梁潇)

📖Education

📑Selected Publications

🧑‍💻Experience

🏆Honors and Awards