OpenAI expands its Reinforcement Fine-Tuning Research Program to let developers create expert models in specific domains with very little training data

the repo we used to train Tulu 3. Expanding reinforcement learning with verifiable rewards (RLVR) to more domains and with better answer extraction (what OpenAI calls a grader, a [image] Kevin Weil / @kevinweil : Day 2 of the 12 days of OpenAI! 🎁 Today something for developers: we're introducing Reinforcement Fine Tuning, a new model customization technique for o1/o1-mini. RFT produces expert models in specific domains—and they're 🤯 good, with as few as a couple dozen examples. Shitian Zhao / @zst96687522 : OpenAI introduce reinforcement finetuning, the method they used internally on developing GPT-4o and o1. You need to submit your training data and define the grader of your task by yourself. George Grigorev / @iamgrigorev : OpenAI improved their FT API with o1 fine-tuning using “Reinforcement” fine-tuning (instead of a supervised one) They prepared list of “graders”, basically pre-defined reward functions and they use true RL to make o1-mini task specific Jordan Thibodeau / @jwthib : We are streaming OpenAI's latest launch! OpenAI Launches Reinforcement Fine-Tuning. After the presentation is over we will do a live demo of ChatGPT Pro and answer your questions. https://www.youtube.com/... @scaling01 : THIS IS HUGE FOR BUSINESS USE CASES On day 2 of Shipmas OpenAI introduces Reinforcement Fine-Tuning (RFT) for o1 models With this o1-mini can outperform o1 on very complex tasks like gene identification! [image] Ray Fernando / @rayfernando1337 : Day 2: OpenAI o1 Reinforcement Fine-Tuning (RFT) - Lets users fine-tune - Knowledge can go from advanced high school to PhD for your own use cases. - Preview launching early next year. - Reason on a custom domain with just 12 examples [image] Sam Altman / @sama : this works amazingly well; it has been one of my biggest surprises of 2024. excited to see what people build! Tibor Blaho / @btibor91 : OpenAI announced an expanded Reinforcement Fine-Tuning Research Program, allowing developers to customize AI models for domain-specific tasks by training them on datasets ranging from dozens to thousands of high-quality tasks and evaluating responses against reference answers - [image] Forums: Hacker News : OpenAI Reinforcement Fine-Tuning Research Program r/OpenAI : OpenAI's Reinforcement Fine-Tuning Research Program

OpenAI 2024-12-07

Chronicles

OpenAI expands its Reinforcement Fine-Tuning Research Program to let developers create expert models in specific domains with very little training data