Zixun Huang (黄梓洵)
Undergraduate in Statistics, Peking University
Seeking Ph.D. opportunities in IEOR and Statistics (2026 Fall).
I am a fourth-year undergraduate student in Statistics at Peking University and participated in an exchange program at the University of California, Berkeley. Throughout my academic journey, I have been fortunate to work closely with Professors Lei Wu, Zeyu Zheng, and Junfeng Hu. My research focuses on the theoretical foundations of machine learning, including scaling laws, and reinforcement learning theory.
📄 Publications
* indicates equal contribution; expand TL;DR for details.
-
Functional Scaling Laws in Kernel Regression: Loss Dynamics and Learning Rate Schedules [Paper]Binghui Li*, Fengling Chen*, Zixun Huang*, Lean Wang*, Lei WuNeurIPS 2025 Spotlight
► TL;DR: We propose a Functional Scaling Law using an intrinsic-time view of SGD, analyze learning-rate schedules, and validate the framework to obtain effective training schedules.
- We introduce a Functional Scaling Law that predicts full loss trajectories under arbitrary learning-rate schedules. Using an intrinsic-time view of SGD on a power-law kernel model, we derive explicit scaling relations that unify loss dynamics across constant, decay, and warmup–stable–decay schedules.
- Experiments on 0.1B–1B LLMs show that the Functional Scaling Law accurately fits and forecasts training loss across schedules. Results confirm larger models' efficiency gains, the benefits of learning-rate decay, and the superior performance of warmup–stable–decay strategies.
-
OBLR-PO: A Theoretical Framework for Stable Reinforcement Learning [Paper]Zixun Huang*, Jiayi Sheng*, Zeyu ZhengSubmitted to AISTATS 2026; Reviewer scores: 6, 5, 5 (max 7).
► TL;DR: We develop a theoretical framework for stable RL post-training by analyzing intrinsic gradient statistics, deriving SNR-based adaptive schedules, and implementing these insights in OBLR-PO to achieve more stable and performant LLM post-training.
- We provide a principled analysis of policy-gradient estimators, proving unbiasedness, deriving exact variance formulas, and bounding optimization loss under mild assumptions. This foundation yields convergence guarantees, an SNR-based adaptive learning-rate schedule, and a gradient-weighted optimal baseline for variance reduction beyond heuristic approaches.
- Instantiating the theory, OBLR-PO jointly adapts learning rates and baselines. Experiments on Qwen3-4B-Base and Qwen3-8B-Base show consistently improved stability and downstream performance over existing methods, demonstrating that the theoretical insights translate directly into practical gains in large-scale RL post-training.
🎓 Educations
-
Peking University, School of Mathematical SciencesSept. 2022 – PresentB.S. in StatisticsElite Undergraduate Program for Applied Mathematics (Top 10%)
-
University of California, BerkeleyJan. 2025 – Aug. 2025Visiting Student
🏅 Selected Honors and Awards
- Hong Sheng Scholarship (Top 12%)2024
- First Prize, Chinese Mathematics Contest (Top 1%)2024
- Yau Contest Groups Prize (Top 5%)2024
- Xiaomi Scholarship (Top 12%)2023
- Gold Prize, Chinese Mathematical Olympiad2021
- Silver Prize, Chinese Mathematical Olympiad2020
📖 Selected Lecture Notes
😉 Miscs
- I'm trying to learn latte art.
- I'm trying to learn how to mix drinks.
- I'm currently volunteering in rabbit care.