gabgarrett · TEXXR

2024-12-07

the o1 reinforcement fine tuning is how you enable companies to completely automate entire realms of white-collar work and tasks. and eventually all those fine tunes will get incorporated into o2 / orion / whatever comes next

2024-12-07 View on X

OpenAI

OpenAI expands its Reinforcement Fine-Tuning Research Program to let developers create expert models in specific domains with very little training data

the repo we used to train Tulu 3. Expanding reinforcement learning with verifiable rewards (RLVR) to more domains and with better answer extraction (what OpenAI calls a grader, a [...

View original

2024-04-19

looks like Llama 3 actually beats out older versions of GPT-4. it's pretty big that there's an open-source version of a model at that level now. llama3 400b will be the moment of truth for beating out GPT-4-04-09 though

2024-04-19 View on X

The Verge

Meta details Llama 3: 8B- and 70B-parameter models, a focus on reducing false refusals, and an upcoming model trained on 15T+ tokens that has 400B+ parameters

What To Know About ‘Llama 3’ Model Marcus Gopolang Moloko / Memeburn : Meta AI with built in Llama 3 is on WhatsApp in South Africa Hamsat Abdurasheed / News.ng : Meta releases Lla...

View original