Daily Papers 2025

melisa 's Collections

updated 13 days ago

Phi-4-reasoning Technical Report

Paper • 2504.21318 • Published 18 days ago • 43

Note - Models have indeed acquired reasoning as a robust and transferable meta-skill. Microsoft’s Responsible AI standards - Additive property of the dataset This additive structure remains central to the final SFT recipe when we further include alignment and general domain data. - To simplify tuning, we clustered data sources based on (1) domain (e.g., math, code) and (2) quality, assigning the same weight to all members of a cluster. - 16B tokens - o3-mini medium was more token efficient