CV
This is a description of the page. You can modify it in '_pages/cv.md'. You can also change or remove the top pdf download button.
Contact Information
| Name | Haojin Yang |
| Professional Title | M.Eng. student |
| yhj@stu.pku.edu.cn | |
| Location | School of Software & Microelectronics, Beijing, |
Professional Summary
M.Eng. student at Peking University. Research on reinforcement learning for large language models — multi-turn agentic RL, credit assignment, diffusion language models.
Experience
-
2026 - present Beijing, China
Agent RL Research Intern
StepFun — Foundation Models
Advisor: Ruihang Miao. Post-training of Step3.6 / Step4 to improve agentic capability using real production traffic.
- Built a multi-turn RL framework on VeRL with user-simulator rollouts and token-level masking / advantage estimation.
- Designed turn-level and session-level reward functions (rule-based + generative); reward redesign yielded a 28.1% relative improvement over the initial version.
-
2025 - 2026 Beijing, China
Foundation Algorithm Research Intern
Meituan — Longcat Interaction
Advisor: Jingqing Ruan. Post-training of Meituan’s in-house Longcat base model to improve persuasion ability in real sales scenarios.
- Multi-turn RL framework on VeRL supporting rollouts with simulated users and per-token mask/advantage.
- Reward design combining turn-level (rule-based) and session-level (generative) signals; +28.1% relative gain over the initial reward.
-
2025 - 2025 Beijing, China
Research Intern
Microsoft Research Asia — DKI Excel Research
Advisor: Ran Jia. RLHF, Excel agents, Deep Research.
- Qwen-32B GRPO training for Python / Formula instruction control in Excel: built multi-turn tool-use pipeline on VeRL with fine-grained rewards, +17% accuracy over GPT-o4-mini.
- Pivot-table dataset: defined a schema covering all operations, built a batched generation pipeline, collected 20k high-quality samples.
- Search benchmark: built a semi-automated pipeline for evaluating large-scale search agents using Wikidata and curated sources with a voting-based validator.
-
2024 - 2025 Beijing, China
LLM Algorithm Intern
Baidu — TPG
Advisor: Ziwei Jin. LoRA fine-tuning, GraphRAG, retrieval-augmented systems.
- Text2Gremlin: LoRA-fine-tuned Qwen2.5-Coder on a curated 60k-from-480k dataset, lifting task accuracy from 70% to 87.2%.
- RAG pipeline: introduced Text2Gremlin + intent classification, compressed QA latency from 9s to 3s in 60% of business scenarios and extended RAG to simple-reasoning queries.
- Proposed a chunk-Graph RAG combining GraphRAG and chunk-RAG for specific business scenarios.
Education
Selected Publications
Awards
-
2025 PKU Academic Excellence Award
Peking University
-
2023 National Scholarship (3rd class)
Ministry of Education, China
-
2024 PKU SSM Graduate Entrance — Ranked 1/4200
Peking University
Initial-exam score 436; ranked 1st in the comprehensive evaluation among ~4200 candidates.
Skills
Programming (Proficient): Python, Java, C++, Shell
ML / RL (Proficient): PyTorch, VeRL, GRPO, RLHF, Multi-turn RL, Diffusion LMs, LoRA
Languages
Chinese : Native speaker
English : CET-6 632 (proficient)
Interests
Research interests: Reinforcement learning, Multi-turn agentic RL, Diffusion language models, Credit assignment, Tool use