Standard diffusion language models suffer from globally unconstrained decoding that accumulates early errors, while block diffusion sacrifices semantic coherence at hard block boundaries. We propose a dynamic wavefront mechanism inspired by physical wave propagation: an active token frontier expands outward from already-decoded positions, with adaptive Expand and confidence-based Prune steps. On GSM8K, HumanEval, and three additional math/code reasoning benchmarks, WavefrontDiffusion surpasses BlockDiffusion across the board while keeping compute on par. We introduce the MHCO metric to quantify boundary-induced reasoning violations and show consistent gains in BERTScore semantic consistency.
@inproceedings{yang2026wavefront,title={WavefrontDiffusion: Dynamic Decoding Schedule for Improved Reasoning},author={Yang, Haojin and Hu, R. and Sun, Z. and Zhou, R. and Cai, Y. and Wang, Y.},booktitle={International Conference on Learning Representations (ICLR)},year={2026},note={Poster},}
ACL’26
Harmonizing Dense and Sparse Signals in Multi-turn RL: Dual-Horizon Credit Assignment for Industrial Sales Agents
Haojin Yang, A. Jian, X. Huang, and 5 more authors
In Annual Meeting of the Association for Computational Linguistics (ACL), 2026
Multi-turn industrial sales dialogues mix sparse long-horizon business signals (conversion) with dense short-horizon language constraints (fluency, compliance), making joint optimization unstable and prone to reward hacking. We propose Dual-Horizon Credit Assignment (DuCA), whose HIAN mechanism applies decoupled normalization to turn-level and session-level advantages so that high-variance commercial rewards no longer suppress gradients on fine-grained linguistic signals. On a high-fidelity simulated commercial environment, DuCA improves conversion by 6.82% over GRPO while cutting multi-turn repetition by 82.28%.
@inproceedings{yang2026duca,title={Harmonizing Dense and Sparse Signals in Multi-turn {RL}: Dual-Horizon Credit Assignment for Industrial Sales Agents},author={Yang, Haojin and Jian, A. and Huang, X. and Wang, Y. and Zhang, W. and Zeng, K. and Cai, X. and Ruan, J.},booktitle={Annual Meeting of the Association for Computational Linguistics (ACL)},year={2026},note={Poster},}
EMNLP
Asymmetric On-Policy Distillation: Bridging Exploitation and Imitation at the Token Level
N. Jia, Haojin Yang, X. Ma, and 6 more authors
In Conference on Empirical Methods in Natural Language Processing (EMNLP), 2026
We bridge on-policy RL exploitation and off-policy imitation at the token level via an asymmetric distillation scheme.
@inproceedings{yang2026apod,title={Asymmetric On-Policy Distillation: Bridging Exploitation and Imitation at the Token Level},author={Jia, N. and Yang, Haojin and Ma, X. and Lian, J. and Zhang, S. and Zhang, W. and Zeng, K. and Cai, X. and Sun, Z.},booktitle={Conference on Empirical Methods in Natural Language Processing (EMNLP)},year={2026},note={Under Review (co-first author)},}
CVPR’26
VADE: Variance-Aware Dynamic Sampling via Online Sample-Level Difficulty Estimation for Multimodal RL
Z. Hu, J. Qiu, T. Bai, and 5 more authors
In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026
@inproceedings{vade2026,title={{VADE}: Variance-Aware Dynamic Sampling via Online Sample-Level Difficulty Estimation for Multimodal {RL}},author={Hu, Z. and Qiu, J. and Bai, T. and Yang, Haojin and Yuan, B. and Jing, Q. and He, C. and Zhang, W.},booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},year={2026},note={Findings}}
EMNLP
TRUST-SQL: Tool-Integrated Multi-Turn Reinforcement Learning for Text-to-SQL over Unknown Schemas
A. Jian, X. Zhang, W. Du, and 5 more authors
In Conference on Empirical Methods in Natural Language Processing (EMNLP), 2026
@inproceedings{trustsql2026,title={{TRUST-SQL}: Tool-Integrated Multi-Turn Reinforcement Learning for Text-to-SQL over Unknown Schemas},author={Jian, A. and Zhang, X. and Du, W. and Ruan, J. and Pei, J. and Zhang, W. and Zeng, K. and Cai, X.},booktitle={Conference on Empirical Methods in Natural Language Processing (EMNLP)},year={2026},note={Under Review}}