2024-12-07
the o1 reinforcement fine tuning is how you enable companies to completely automate entire realms of white-collar work and tasks. and eventually all those fine tunes will get incorporated into o2 / orion / whatever comes next
OpenAI
OpenAI expands its Reinforcement Fine-Tuning Research Program to let developers create expert models in specific domains with very little training data
the repo we used to train Tulu 3. Expanding reinforcement learning with verifiable rewards (RLVR) to more domains and with better answer extraction (what OpenAI calls a grader, a [...
2024-04-19
looks like Llama 3 actually beats out older versions of GPT-4. it's pretty big that there's an open-source version of a model at that level now. llama3 400b will be the moment of truth for beating out GPT-4-04-09 though
The Verge
Meta details Llama 3: 8B- and 70B-parameter models, a focus on reducing false refusals, and an upcoming model trained on 15T+ tokens that has 400B+ parameters
What To Know About ‘Llama 3’ Model Marcus Gopolang Moloko / Memeburn : Meta AI with built in Llama 3 is on WhatsApp in South Africa Hamsat Abdurasheed / News.ng : Meta releases Lla...