Beyond “Aha!”: Systematic Meta‑Ability Alignment in Large Reasoning Models

1National University of Singapore, 2Tsinghua University, 3Salesforce AI Research

Abstract

Large reasoning models (LRMs) possess a latent capacity for long chain‑of‑thought reasoning, but the timing and consistency of emergent “aha” behaviors remain unpredictable. We explicitly align LRMs with three meta‑abilities—deduction, induction, and abduction—using automatically generated, self‑verifiable tasks. Our three‑stage pipeline (individual alignment, parameter‑space merging, and domain‑specific reinforcement learning) lifts performance ceilings by ≤10 % over instruction‑tuned baselines and delivers state‑of‑the‑art accuracy across math, coding, and science benchmarks.

Three‑Stage Training Framework

Three‑stage meta‑ability alignment framework diagram.
Stage A: Meta‑ability alignment  ⟶  Stage B: Parameter‑space merging  ⟶  Stage C: Domain‑specific RL.

Key Results

Performance tables showing consistent gains from meta‑ability alignment.
Table 1 & 2: Meta‑ability alignment boosts reasoning performance at both 7B and 32B scales.

BibTeX

@article{hu2025metaability,
  author  = {Hu, Zhiyuan and Wang, Yibo and Dong, Hanze and Xu, Yuhui and Saha, Amrita and Xiong, Caiming and Hooi, Bryan and Li, Junnan},
  title   = {Beyond “Aha!”: Systematic Meta‑Ability Alignment in Large Reasoning Models},
  journal = {Arxiv},
  year    = {2025}
}