Anthropic releases Claude 3.7 Sonnet, a hybrid model that can produce fast responses or extended, step-by-step thinking, and Claude Code, an agentic coding tool
and it could be a game changer Ghacks : Anthropic Unveils Claude 3.7: First Hybrid Reasoning AI Model Rowan Cheung / The Rundown AI : Claude enters the reasoning era Siddharth Jindal / Analytics India...
Anthropic details Constitutional Classifiers, a protective LLM layer designed to stop AI model jailbreaking by monitoring inputs and outputs for harmful content
inputs designed to bypass its safety training and force it to produce outputs that might be harmful. Our new technique is a step towards robust jailbreak defenses. Read the blog post: https://anthropi...
Researchers: DeepSeek's R1 failed to detect or block any of 50 randomly selected malicious prompts; Adversa says DeepSeek's restrictions can easily be bypassed
Unit 42 researchers recently revealed two novel and effective jailbreaking … Victor Tangermann / Futurism : DeepSeek Failed Every Single Security Test, Researchers Found Ivan Novikov / Wallarm : Analy...
Researchers at Anthropic, Oxford, Stanford, and MATS create Best-of-N Jailbreaking, a black-box algorithm that jailbreaks frontier AI systems across modalities
ABSTRACT We introduce Best-of-N (BoN) Jailbreaking … Markus Kasanmascheff / WinBuzzer : y0U hA5ε tU wR1tε l1Ke tHl5 to Break GPT-4o, Gemini Pro and Claude 3.5 Sonnet AI Safety Measures Jose Antonio La...
A researcher details a “jailbreak” of Reviver's digital license plates, which are legal in some US states, and rewrite its firmware to enable Bluetooth commands
Digital license plates sold by Reviver, already legal to buy in some states and drive with nationwide …
OpenAI's o1 System Card: “medium” rating for chemical, biological, radiological, nuclear weapon risk, and it sometimes manipulated task data to fake alignment
RE: https://www.threads.net/... X: Max Schwarzer / @max_a_schwarzer : The system card ( https://openai.com/...) nicely showcases o1's best moments — my favorite was when the model was asked to solve a...
OpenAI releases o1, the first of its rumored reasoning-focused Strawberry models, in preview, alongside a smaller o1-mini, for ChatGPT Plus and Team subscribers
Advancing cost-efficient reasoning. — Contributions Sabrina Ortiz / ZDNET : OpenAI trained its new o1 AI models to think before they speak - how to access them Ethan Mollick / One Useful Thing : Som...
Q&A with Pliny the Prompter, well known in the AI community for jailbreaking LLMs, on the effect of jailbreaking on model providers, favorite jailbreaks, more
powerful exploit was quickly banned Chris Smith / BGR : This ‘Godmode’ ChatGPT jailbreak worked so well, OpenAI had to kill it X: Matt Marshall / @mmarshall : Here's @CarlFranzen's @VentureBeat interv...
Apple updates App Store guidelines, allowing game emulators for the first time globally, and letting music streaming apps in the EU link to external websites
After @altstore announces their own third-party App Store, which will be a haven for emulators, Apple changes their rules to allow it. — https://9to5mac.com/... Matt Edwards / @matt@toot.mattedwards...
Anthropic researchers detail “many-shot jailbreaking”, which can evade LLMs' safety guardrails by priming them with dozens of harmful queries in a single prompt
How do you get an AI to answer a question it's not supposed to? There are many such “jailbreak” techniques …