2024-12-07
OpenAI announces Reinforcement Fine Tuning for their o1 reasoning model, which allows you to adapt o1 to specialize its expertise in a given domain. Apparently it works with as little as a dozen examples.
OpenAI
OpenAI expands its Reinforcement Fine-Tuning Research Program to let developers create expert models in specific domains with very little training data
the repo we used to train Tulu 3. Expanding reinforcement learning with verifiable rewards (RLVR) to more domains and with better answer extraction (what OpenAI calls a grader, a [...