publications |

2026

ICLR’26
WavefrontDiffusion: Dynamic Decoding Schedule for Improved Reasoning

Haojin Yang, R. Hu, Z. Sun, and 3 more authors

In International Conference on Learning Representations (ICLR), 2026

Poster

Abs Bib HTML Code

Standard diffusion language models suffer from globally unconstrained decoding that accumulates early errors, while block diffusion sacrifices semantic coherence at hard block boundaries. We propose a dynamic wavefront mechanism inspired by physical wave propagation: an active token frontier expands outward from already-decoded positions, with adaptive Expand and confidence-based Prune steps. On GSM8K, HumanEval, and three additional math/code reasoning benchmarks, WavefrontDiffusion surpasses BlockDiffusion across the board while keeping compute on par. We introduce the MHCO metric to quantify boundary-induced reasoning violations and show consistent gains in BERTScore semantic consistency.
@inproceedings{yang2026wavefront, title = {WavefrontDiffusion: Dynamic Decoding Schedule for Improved Reasoning}, author = {Yang, Haojin and Hu, R. and Sun, Z. and Zhou, R. and Cai, Y. and Wang, Y.}, booktitle = {International Conference on Learning Representations (ICLR)}, year = {2026}, note = {Poster}, }
ACL’26
Harmonizing Dense and Sparse Signals in Multi-turn RL: Dual-Horizon Credit Assignment for Industrial Sales Agents

Haojin Yang, A. Jian, X. Huang, and 5 more authors

In Annual Meeting of the Association for Computational Linguistics (ACL), 2026

Poster

Abs Bib HTML

Multi-turn industrial sales dialogues mix sparse long-horizon business signals (conversion) with dense short-horizon language constraints (fluency, compliance), making joint optimization unstable and prone to reward hacking. We propose Dual-Horizon Credit Assignment (DuCA), whose HIAN mechanism applies decoupled normalization to turn-level and session-level advantages so that high-variance commercial rewards no longer suppress gradients on fine-grained linguistic signals. On a high-fidelity simulated commercial environment, DuCA improves conversion by 6.82% over GRPO while cutting multi-turn repetition by 82.28%.
@inproceedings{yang2026duca, title = {Harmonizing Dense and Sparse Signals in Multi-turn {RL}: Dual-Horizon Credit Assignment for Industrial Sales Agents}, author = {Yang, Haojin and Jian, A. and Huang, X. and Wang, Y. and Zhang, W. and Zeng, K. and Cai, X. and Ruan, J.}, booktitle = {Annual Meeting of the Association for Computational Linguistics (ACL)}, year = {2026}, note = {Poster}, }

EMNLP

Asymmetric On-Policy Distillation: Bridging Exploitation and Imitation at the Token Level

N. Jia, Haojin Yang, X. Ma, and 6 more authors

In Conference on Empirical Methods in Natural Language Processing (EMNLP), 2026

Under Review (co-first author)

@inproceedings{yang2026apod,
  title = {Asymmetric On-Policy Distillation: Bridging Exploitation and Imitation at the Token Level},
  author = {Jia, N. and Yang, Haojin and Ma, X. and Lian, J. and Zhang, S. and Zhang, W. and Zeng, K. and Cai, X. and Sun, Z.},
  booktitle = {Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  year = {2026},
  note = {Under Review (co-first author)},
}

CVPR’26

VADE: Variance-Aware Dynamic Sampling via Online Sample-Level Difficulty Estimation for Multimodal RL

Z. Hu, J. Qiu, T. Bai, and 5 more authors

In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026

Findings

Bib

@inproceedings{vade2026,
  title = {{VADE}: Variance-Aware Dynamic Sampling via Online Sample-Level Difficulty Estimation for Multimodal {RL}},
  author = {Hu, Z. and Qiu, J. and Bai, T. and Yang, Haojin and Yuan, B. and Jing, Q. and He, C. and Zhang, W.},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2026},
  note = {Findings}
}

EMNLP

TRUST-SQL: Tool-Integrated Multi-Turn Reinforcement Learning for Text-to-SQL over Unknown Schemas

A. Jian, X. Zhang, W. Du, and 5 more authors

In Conference on Empirical Methods in Natural Language Processing (EMNLP), 2026

Under Review

Bib

@inproceedings{trustsql2026,
  title = {{TRUST-SQL}: Tool-Integrated Multi-Turn Reinforcement Learning for Text-to-SQL over Unknown Schemas},
  author = {Jian, A. and Zhang, X. and Du, W. and Ruan, J. and Pei, J. and Zhang, W. and Zeng, K. and Cai, X.},
  booktitle = {Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  year = {2026},
  note = {Under Review}
}