marcymurninghan

2024-12-22

How 'bout ethical discernment, too? Moral agency matters! — Quoting @techmeme.com: — Anthropic research isn't meant to just show that these guardrails can be bypassd, but hopes that “generatng extensive data on successful attack patterns” will open up “novel opps to develop bettr defense mechanisms.” …

2024-12-22 View on X

404 Media

Researchers at Anthropic, Oxford, Stanford, and MATS create Best-of-N Jailbreaking, a black-box algorithm that jailbreaks frontier AI systems across modalities

ABSTRACT We introduce Best-of-N (BoN) Jailbreaking … Markus Kasanmascheff / WinBuzzer : y0U hA5ε tU wR1tε l1Ke tHl5 to Break GPT-4o, Gemini Pro and Claude 3.5 Sonnet AI Safety Meas...

View original

2024-12-21

2024-12-21 View on X

404 Media

Researchers at Anthropic, Oxford, Stanford, and MATS create Best-of-N Jailbreaking, a black-box algorithm that jailbreaks frontier AI systems across modalities

New research from Anthropic, one of the leading AI companies and the developer of the Claude family of Large Language Models …

View original