CV

This is a description of the page. You can modify it in '_pages/cv.md'. You can also change or remove the top pdf download button.

Contact Information

Name Haojin Yang
Professional Title M.Eng. student
Email yhj@stu.pku.edu.cn
Location School of Software & Microelectronics, Beijing,

Professional Summary

M.Eng. student at Peking University. Research on reinforcement learning for large language models — multi-turn agentic RL, credit assignment, diffusion language models.

Experience

  • 2026 - present

    Beijing, China

    Agent RL Research Intern
    StepFun — Foundation Models
    Advisor: Ruihang Miao. Post-training of Step3.6 / Step4 to improve agentic capability using real production traffic.
    • Built a multi-turn RL framework on VeRL with user-simulator rollouts and token-level masking / advantage estimation.
    • Designed turn-level and session-level reward functions (rule-based + generative); reward redesign yielded a 28.1% relative improvement over the initial version.
  • 2025 - 2026

    Beijing, China

    Foundation Algorithm Research Intern
    Meituan — Longcat Interaction
    Advisor: Jingqing Ruan. Post-training of Meituan’s in-house Longcat base model to improve persuasion ability in real sales scenarios.
    • Multi-turn RL framework on VeRL supporting rollouts with simulated users and per-token mask/advantage.
    • Reward design combining turn-level (rule-based) and session-level (generative) signals; +28.1% relative gain over the initial reward.
  • 2025 - 2025

    Beijing, China

    Research Intern
    Microsoft Research Asia — DKI Excel Research
    Advisor: Ran Jia. RLHF, Excel agents, Deep Research.
    • Qwen-32B GRPO training for Python / Formula instruction control in Excel: built multi-turn tool-use pipeline on VeRL with fine-grained rewards, +17% accuracy over GPT-o4-mini.
    • Pivot-table dataset: defined a schema covering all operations, built a batched generation pipeline, collected 20k high-quality samples.
    • Search benchmark: built a semi-automated pipeline for evaluating large-scale search agents using Wikidata and curated sources with a voting-based validator.
  • 2024 - 2025

    Beijing, China

    LLM Algorithm Intern
    Baidu — TPG
    Advisor: Ziwei Jin. LoRA fine-tuning, GraphRAG, retrieval-augmented systems.
    • Text2Gremlin: LoRA-fine-tuned Qwen2.5-Coder on a curated 60k-from-480k dataset, lifting task accuracy from 70% to 87.2%.
    • RAG pipeline: introduced Text2Gremlin + intent classification, compressed QA latency from 9s to 3s in 60% of business scenarios and extended RAG to simple-reasoning queries.
    • Proposed a chunk-Graph RAG combining GraphRAG and chunk-RAG for specific business scenarios.

Education

  • 2024 - 2027

    Beijing, China

    M.Eng.
    Peking University
    Software Engineering
    • Ranked 1/4200 in the comprehensive entrance evaluation
    • PKU Academic Excellence Award
  • 2020 - 2024

    Nanjing, China

    B.Eng.
    Nanjing University
    Software Engineering
    • National Scholarship (3rd class)

Selected Publications

Awards

  • 2025
    PKU Academic Excellence Award
    Peking University
  • 2023
    National Scholarship (3rd class)
    Ministry of Education, China
  • 2024
    PKU SSM Graduate Entrance — Ranked 1/4200
    Peking University

    Initial-exam score 436; ranked 1st in the comprehensive evaluation among ~4200 candidates.

Skills

Programming (Proficient): Python, Java, C++, Shell
ML / RL (Proficient): PyTorch, VeRL, GRPO, RLHF, Multi-turn RL, Diffusion LMs, LoRA

Languages

Chinese : Native speaker
English : CET-6 632 (proficient)

Interests

Research interests: Reinforcement learning, Multi-turn agentic RL, Diffusion language models, Credit assignment, Tool use