I am Haojin Yang (杨昊锦), a M.Eng. student in Software Engineering at Peking University (2024–2027). I received my B.Eng. in Software Engineering from Nanjing University in 2024.

My research focuses on reinforcement learning for large language models — in particular multi-turn / agentic RL, credit assignment under sparse business rewards, and diffusion language models with adaptive decoding schedules.

I am currently a research intern at StepFun Foundation Models (since 2026-04, advised by Ruihang Miao). Previously, I worked on multi-turn RL for industrial sales agents at Meituan Longcat Interaction (2025-10 to 2026-04, advised by Jingqing Ruan), on RLHF for Excel agents at Microsoft Research Asia DKI (advised by Ran Jia), and on GraphRAG at Baidu TPG (advised by Ziwei Jin).

Feel free to reach out at yhj [at] stu [dot] pku [dot] edu [dot] cn.

School of Software & Microelectronics

Peking University

Beijing, China

news

Apr 2026	Started as Agent RL Research Intern at StepFun Foundation Models.
Apr 2026	Harmonizing Dense and Sparse Signals in Multi-turn RL (DuCA) is accepted to ACL 2026 (Poster).
Mar 2026	VADE is accepted to CVPR 2026 Findings.
Jan 2026	WavefrontDiffusion is accepted to ICLR 2026 (Poster). 🎉
Oct 2025	Started as Foundation Algorithm Research Intern at Meituan Longcat Interaction.

selected publications

ICLR’26
WavefrontDiffusion: Dynamic Decoding Schedule for Improved Reasoning

Haojin Yang, R. Hu, Z. Sun, and 3 more authors

In International Conference on Learning Representations (ICLR), 2026

Poster

Abs Bib HTML Code

Standard diffusion language models suffer from globally unconstrained decoding that accumulates early errors, while block diffusion sacrifices semantic coherence at hard block boundaries. We propose a dynamic wavefront mechanism inspired by physical wave propagation: an active token frontier expands outward from already-decoded positions, with adaptive Expand and confidence-based Prune steps. On GSM8K, HumanEval, and three additional math/code reasoning benchmarks, WavefrontDiffusion surpasses BlockDiffusion across the board while keeping compute on par. We introduce the MHCO metric to quantify boundary-induced reasoning violations and show consistent gains in BERTScore semantic consistency.
@inproceedings{yang2026wavefront, title = {WavefrontDiffusion: Dynamic Decoding Schedule for Improved Reasoning}, author = {Yang, Haojin and Hu, R. and Sun, Z. and Zhou, R. and Cai, Y. and Wang, Y.}, booktitle = {International Conference on Learning Representations (ICLR)}, year = {2026}, note = {Poster}, }
ACL’26
Harmonizing Dense and Sparse Signals in Multi-turn RL: Dual-Horizon Credit Assignment for Industrial Sales Agents

Haojin Yang, A. Jian, X. Huang, and 5 more authors

In Annual Meeting of the Association for Computational Linguistics (ACL), 2026

Poster

Abs Bib HTML

Multi-turn industrial sales dialogues mix sparse long-horizon business signals (conversion) with dense short-horizon language constraints (fluency, compliance), making joint optimization unstable and prone to reward hacking. We propose Dual-Horizon Credit Assignment (DuCA), whose HIAN mechanism applies decoupled normalization to turn-level and session-level advantages so that high-variance commercial rewards no longer suppress gradients on fine-grained linguistic signals. On a high-fidelity simulated commercial environment, DuCA improves conversion by 6.82% over GRPO while cutting multi-turn repetition by 82.28%.
@inproceedings{yang2026duca, title = {Harmonizing Dense and Sparse Signals in Multi-turn {RL}: Dual-Horizon Credit Assignment for Industrial Sales Agents}, author = {Yang, Haojin and Jian, A. and Huang, X. and Wang, Y. and Zhang, W. and Zeng, K. and Cai, X. and Ruan, J.}, booktitle = {Annual Meeting of the Association for Computational Linguistics (ACL)}, year = {2026}, note = {Poster}, }

EMNLP

Asymmetric On-Policy Distillation: Bridging Exploitation and Imitation at the Token Level

N. Jia, Haojin Yang, X. Ma, and 6 more authors

In Conference on Empirical Methods in Natural Language Processing (EMNLP), 2026

Under Review (co-first author)

Abs Bib HTML

@inproceedings{yang2026apod,
  title = {Asymmetric On-Policy Distillation: Bridging Exploitation and Imitation at the Token Level},
  author = {Jia, N. and Yang, Haojin and Ma, X. and Lian, J. and Zhang, S. and Zhang, W. and Zeng, K. and Cai, X. and Sun, Z.},
  booktitle = {Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  year = {2026},
  note = {Under Review (co-first author)},
}