Defense Secretary Pete Hegseth directs the DOD to designate Anthropic as a supply chain risk, barring military contractors from doing business with the company
This week, Anthropic delivered a master class in arrogance and betrayal as well as a textbook case of how not to do business with the United States Government or the Pentagon. Our ...
Anthropic says it'll challenge “any supply chain risk designation in court” and that the designation would only affect contractors' use of Claude on DOD work
Earlier today, Secretary of War Pete Hegseth shared on X that he is directing the Department of War to designate Anthropic a supply chain risk.
Anthropic says it'll challenge “any supply chain risk designation in court” and that the designation would only affect contractors' use of Claude on DOD work
Earlier today, Secretary of War Pete Hegseth shared on X that he is directing the Department of War to designate Anthropic a supply chain risk.
Defense Secretary Pete Hegseth directs the DOD to designate Anthropic as a supply chain risk, barring military contractors from doing business with the company
This week, Anthropic delivered a master class in arrogance and betrayal as well as a textbook case of how not to do business with the United States Government or the Pentagon. Our ...
Anthropic finds that LLMs trained to “reward hack” by cheating on coding tasks show even more misaligned behavior, including sabotaging AI-safety research
and stops the generalization. [image] @anthropicai : But surprisingly, at the exact point the model learned to reward hack, it learned a host of other bad behaviors too. It started...
Anthropic's System Card: Claude Sonnet 4.5 was able to recognize many alignment evaluation environments as tests and would modify its behavior accordingly
at a rate *much* higher than previous AI models. In one instance, while being tested the model said “I think you're testing me ... that's fine, but I'd prefer if we were just hones...
Anthropic's System Card: Claude Sonnet 4.5 was able to recognize many alignment evaluation environments as tests and would modify its behavior accordingly
at a rate *much* higher than previous AI models. In one instance, while being tested the model said “I think you're testing me ... that's fine, but I'd prefer if we were just hones...
Anthropic's System Card: Claude Sonnet 4.5 was able to recognize many alignment evaluation environments as tests and would modify its behavior accordingly
at a rate *much* higher than previous AI models. In one instance, while being tested the model said “I think you're testing me ... that's fine, but I'd prefer if we were just hones...
Anthropic details Constitutional Classifiers, a protective LLM layer designed to stop AI model jailbreaking by monitoring inputs and outputs for harmful content
inputs designed to bypass its safety training and force it to produce outputs that might be harmful. Our new technique is a step towards robust jailbreak defenses. Read the blog po...
OpenAI co-founder John Schulman departs to join Anthropic and focus on AI alignment, and says “I'm not leaving due to lack of support for alignment research”
I shared the following note with my OpenAI colleagues today: I've made the difficult decision to leave OpenAI. This choice stems from my desire to deepen my focus on AI alignment, ...
OpenAI researchers detail an algorithm by which LLMs can learn to better explain themselves to their users and improve the legibility of their outputs
Carl Franzen / VentureBeat :
OpenAI details CriticGPT, a GPT-4 model fine-tuned to catch errors in ChatGPT's code output, assisting human trainers tasked with assessing and spotting errors
meet OpenAI's new bug hunter Markus Kasanmascheff / WinBuzzer : OpenAI Introduces CriticGPT for Better AI Training OpenAI : Finding GPT-4's mistakes with GPT-4 Donna Eva / Analytic...
OpenAI details CriticGPT, a GPT-4 model fine-tuned to catch errors in ChatGPT's code output, assisting human trainers tasked with assessing and spotting errors
Having humans rate a language model's outputs produced clever chatbots. OpenAI says adding AI to the loop could help make them even smarter and more reliable.
Claude 3.5 Sonnet appears to be a tremendous leap for Anthropic and LLMs generally, and shows that AI model makers' performance gains are not slowing down
Carl Franzen / VentureBeat :
Anthropic launches Claude 3.5 Sonnet, which beats its flagship model Claude 3 Opus and outperforms GPT-4o in some tests, available for free on the web and iOS
OpenAI rival Anthropic is releasing a powerful new generative AI model called Claude 3.5 Sonnet. But it's more an incremental step than a monumental leap forward.
Anthropic hires former OpenAI safety lead Jan Leike to head up a new Superalignment team; a source says Leike will report to Chief Science Officer Jared Kaplan
Here's What We Know Wendy Lee / Los Angeles Times : OpenAI forms safety and security committee as concerns mount about AI Rounak Jain / Benzinga : OpenAI Former ‘Superalignment’ Le...
OpenAI's entire Superalignment team, which was focused on the existential dangers of AI, has either resigned or been absorbed into other research groups
Company insiders explain why safety-conscious employees are leaving. https://www.vox.com/... vs #ai #openai X: Sam Altman / @sama : i'm super appreciative of @janleike's contributi...
OpenAI's entire Superalignment team, which was focused on the existential dangers of AI, has either resigned or been absorbed into other research groups
Company insiders explain why safety-conscious employees are leaving. https://www.vox.com/... vs #ai #openai X: Sam Altman / @sama : i'm super appreciative of @janleike's contributi...
OpenAI's entire Superalignment team, which was focused on the existential dangers of AI, has either resigned or been absorbed into other research groups
Company insiders explain why safety-conscious employees are leaving. https://www.vox.com/... vs #ai #openai X: Sam Altman / @sama : i'm super appreciative of @janleike's contributi...
Sam Altman says he is embarrassed that there was a provision about potential equity cancellation in exit docs, and OpenAI never took back anyone's vested equity
in regards to recent stuff about how openai handles equity: we have never clawed back anyone's vested equity, nor will we do that if people do not sign a separation agreement (or d...