summeryue0 · TEXXR

🚀 Instruction Following - SEAL Leaderboards are out! IF winners: - GPT-4o and GPT-4 Turbo - Llama 3 70B Instruct - Mistral Large Gemini Pro 1.5 leaps into top 3 in preference rankings, and Claude rockets to #2 in factuality. See https://scale.com/... [image]

2024-05-30 View on X

SiliconANGLE

AI training data provider Scale AI releases SEAL Leaderboards, which uses private datasets to rank LLMs in domains like coding, instruction following, and math

View original

🚀 Coding - The first expert evaluated SEAL Leaderboards are out! The coding race is neck and neck, winners: - GPT-4 Turbo and GPT-4o - Gemini Pro 1.5 - Claude 3 Opus See https://scale.com/... for details detailed analysis for each model! [image]

2024-05-30 View on X

SiliconANGLE

AI training data provider Scale AI releases SEAL Leaderboards, which uses private datasets to rank LLMs in domains like coding, instruction following, and math

View original

🚀 Math - we released the GSM1k last month. Today, we augmented it with human ratings to account for chatty yet correct responses. Explore the GSM1k leaderboard as part of SEAL Leaderboards. We were glad to see LLMs have mostly nailed grade school math! [image]

2024-05-30 View on X

SiliconANGLE

AI training data provider Scale AI releases SEAL Leaderboards, which uses private datasets to rank LLMs in domains like coding, instruction following, and math

View original

🚀 Spanish - The first expert evaluated SEAL Leaderboards are out! Spanish is our first multilingual leaderboard ( https://scale.com/...), winners: - GPT-4o - Gemini 1.5 Pro (post-I/O) - GPT-4 Turbo We plan to roll out more languages, which ones should we build next? [image]

2024-05-30 View on X

SiliconANGLE

AI training data provider Scale AI releases SEAL Leaderboards, which uses private datasets to rank LLMs in domains like coding, instruction following, and math

View original

🚀 Introducing the SEAL Leaderboards! We rank LLMs using private datasets that can't be gamed. Vetted experts handle the ratings, and we share our methods in detail openly! Check out our leaderboards at https://scale.com/...! Which evals should we build next? [image]

2024-05-30 View on X

SiliconANGLE

AI training data provider Scale AI releases SEAL Leaderboards, which uses private datasets to rank LLMs in domains like coding, instruction following, and math

View original