M.Eng. student at Peking University. Foundation models, agentic RL, diffusion LMs.
I am Haojin Yang (杨昊锦), a M.Eng. student in Software Engineering at Peking University (2024–2027). I received my B.Eng. in Software Engineering from Nanjing University in 2024.
My research focuses on reinforcement learning for large language models — in particular multi-turn / agentic RL, credit assignment under sparse business rewards, and diffusion language models with adaptive decoding schedules.
I am currently a research intern at StepFun Foundation Models (since 2026-04, advised by Ruihang Miao). Previously, I worked on multi-turn RL for industrial sales agents at Meituan Longcat Interaction (2025-10 to 2026-04, advised by Jingqing Ruan), on RLHF for Excel agents at Microsoft Research Asia DKI (advised by Ran Jia), and on GraphRAG at Baidu TPG (advised by Ziwei Jin).
Feel free to reach out at yhj [at] stu [dot] pku [dot] edu [dot] cn.
School of Software & Microelectronics
Peking University
Beijing, China
news
| Apr 2026 | Started as Agent RL Research Intern at StepFun Foundation Models. |
|---|---|
| Apr 2026 | Harmonizing Dense and Sparse Signals in Multi-turn RL (DuCA) is accepted to ACL 2026 (Poster). |
| Mar 2026 | VADE is accepted to CVPR 2026 Findings. |
| Jan 2026 | WavefrontDiffusion is accepted to ICLR 2026 (Poster). 🎉 |
| Oct 2025 | Started as Foundation Algorithm Research Intern at Meituan Longcat Interaction. |