Beyond “Aha!” — Meta‑Ability Alignment for Reasoning Models

Beyond “Aha!”: Systematic Meta‑Ability Alignment in Large Reasoning Models

¹National University of Singapore, ²Tsinghua University, ³Salesforce AI Research

Abstract

Large reasoning models (LRMs) possess a latent capacity for long chain‑of‑thought reasoning, but the timing and consistency of emergent “aha” behaviors remain unpredictable. We explicitly align LRMs with three meta‑abilities—deduction, induction, and abduction—using automatically generated, self‑verifiable tasks. Our three‑stage pipeline (individual alignment, parameter‑space merging, and domain‑specific reinforcement learning) lifts performance ceilings by ≤10 % over instruction‑tuned baselines and delivers state‑of‑the‑art accuracy across math, coding, and science benchmarks.

BibTeX

@article{hu2025metaability, author = {Hu, Zhiyuan and Wang, Yibo and Dong, Hanze and Xu, Yuhui and Saha, Amrita and Xiong, Caiming and Hooi, Bryan and Li, Junnan}, title = {Beyond “Aha!”: Systematic Meta‑Ability Alignment in Large Reasoning Models}, journal = {Arxiv}, year = {2025} }

Beyond “Aha!”: Systematic Meta‑Ability Alignment in Large Reasoning Models

Abstract

Three‑Stage Training Framework

Key Results

BibTeX