2024-12-07
Teaching AI to “reason” in specific business domains. Previous finetuning is about mimicking outputs. Reinforcement ft has the model reason through a problem, then rates how it does. This trains it to work through a problem in a specific way rather than just guess an output.
OpenAI
OpenAI expands its Reinforcement Fine-Tuning Research Program to let developers create expert models in specific domains with very little training data
the repo we used to train Tulu 3. Expanding reinforcement learning with verifiable rewards (RLVR) to more domains and with better answer extraction (what OpenAI calls a grader, a [...