2024-05-30
🚀 Instruction Following - SEAL Leaderboards are out! IF winners: - GPT-4o and GPT-4 Turbo - Llama 3 70B Instruct - Mistral Large Gemini Pro 1.5 leaps into top 3 in preference rankings, and Claude rockets to #2 in factuality. See https://scale.com/... [image]
SiliconANGLE
AI training data provider Scale AI releases SEAL Leaderboards, which uses private datasets to rank LLMs in domains like coding, instruction following, and math
🚀 Coding - The first expert evaluated SEAL Leaderboards are out! The coding race is neck and neck, winners: - GPT-4 Turbo and GPT-4o - Gemini Pro 1.5 - Claude 3 Opus See https://scale.com/... for details detailed analysis for each model! [image]
SiliconANGLE
AI training data provider Scale AI releases SEAL Leaderboards, which uses private datasets to rank LLMs in domains like coding, instruction following, and math
🚀 Math - we released the GSM1k last month. Today, we augmented it with human ratings to account for chatty yet correct responses. Explore the GSM1k leaderboard as part of SEAL Leaderboards. We were glad to see LLMs have mostly nailed grade school math! [image]
SiliconANGLE
AI training data provider Scale AI releases SEAL Leaderboards, which uses private datasets to rank LLMs in domains like coding, instruction following, and math
🚀 Spanish - The first expert evaluated SEAL Leaderboards are out! Spanish is our first multilingual leaderboard ( https://scale.com/...), winners: - GPT-4o - Gemini 1.5 Pro (post-I/O) - GPT-4 Turbo We plan to roll out more languages, which ones should we build next? [image]
SiliconANGLE
AI training data provider Scale AI releases SEAL Leaderboards, which uses private datasets to rank LLMs in domains like coding, instruction following, and math
🚀 Introducing the SEAL Leaderboards! We rank LLMs using private datasets that can't be gamed. Vetted experts handle the ratings, and we share our methods in detail openly! Check out our leaderboards at https://scale.com/...! Which evals should we build next? [image]
SiliconANGLE